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ABSTRACT:  A  system  for  computer  vision  is  presented,  which  is  based  on 

two-dimensional  prototypes,  and  which  uses  a  hierarchy  of 
features  for  mapping  purposes. 

More  specifically,  we  are  dealing  with  scenes  composed  of 
planar  faced,  convex  objects.  Extensions  to  the  general 
planar  faced  case  are  discussed. 


The  visual  input  is  provided  by  a  TV 
to  internet  that  input  by  computer, 
d i mens i ona I  scene. 


-camera,  and  the  problem  is 
as  a  projection  of  a  three- 


The  digitized  picture  is  first  scanned  for  significant  intensity 
jrad'ents  (caned  edgas),  which  are  likely  to  appear  at  region-' 
ad  object  junctions.  The  two-dimensional  scene-representation 
given  by  the  totality  of  such  intensity  discontinuities  (that 
word  used  somewhat  inexactly)  is  referred  to  in  the  sequel  as  the 

edge-drawing  ,  and  constitutes  the  input  to  the  vision  system 
presented  here.  ysxem 


The  system  proposed  and  demonstrated  in  this  paper  utilizes 
per  spec t i ve I y  consistent  two-dimensional  models  (prototypes) 
of  views  of  three-dimensional  objects,  and  interpretations  of 
scene-representations  are  based  on  the  establishment  of  mapping 
re  ationships  from  conglomerates  of  scene-elements  (line- 
constellations)  to  prototype  templates.  The  prototypes  are 
learned  by  the  program  through  analysis  of  -  and  generalization 
on  -  ideal  instances. 

The  system  works  better  than  any  sequential  (or  other)  system 
presented  so  far.  It  should  be  well  suited  to  the  context  of  a 
complete  vision  system,  using  depth,  occlusion,  support  relations 
etc.  The  general  case  of  irregularly  shaped,  planar  faced 
objects,  including  concave  ones,  would  necessitate  such  an 
extended  context. 
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ABSTRACT 


A  system  for  computer  vision  is  presented,  which  is  based  on  two- 
dimensional  prototypes,  and  which  uses  a  hierarchy  of  features  for 
mapping  purposes. 

More  specifically,  we  are  dealing  with  scenes  composed  of  planar  faced, 
convex  objects.  Extensions  to  the  general  planar  faced  case  are 
d  i  scussed. 

The  visual  input  is  provided  by  a  TV-camera,  and  the  problem  is  to 

interpret  that  input  by  computer,  as  a  projection  of  a  three-dimensional 
scene. 

In  this  case  the  digitized  picture  is  first  scanned  for  significant 
intensity  gradients  (called  edges),  which  are  likely  to  appear  at 
region-  and  object  junctions.  The  two-dimensional  scene-representation 
given  by  the  totality  of  such  intensity  discontinuities  (that  word  used 
somewhat  inexactly)  is  referred  to  in  the  sequel  as  the  "edge-drawing", 
and  constitutes  the  input  to  the  vision  system  presented  here. 

If  edge-drawings  were  perfect,  the  task  of  interpreting  them,  that  is  of 
determining  the  composition  of  the  scene  (in  terms  of  partaking 
objects),  would  not  be  an  excessively  hard  one.  A  rather  simple  scheme 
of  sequential  abstractions  would  work  adequately,  obtaining  successively 
higher  levels  of  abstraction  (information  compression),  in  some  order 
likes  Edges  -  Lines  -  Vertices  -  Regions  -  Bodies  -  Scene. 
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Unfortunately,  edge-drawings  are  very  seldom  anywhere  near  perfect,  due 
to  effects  like  shadows,  glare,  reflections  insufficient  intensity 
gradients  between  regions,  hardware  imperfections,  digitization  errors, 
etc.  The  sequential  approach,  therefore,  does  not  work  adequately  in 
practice.  The  need  for  more  global  information  (even  at  low  levels  of 
abstraction)  has  become  more  evident  with  every  effort  put  into  the 
development  of  sequential  vision  schemes. 

The  system  proposed  and  demonstrated  in  this  paper  utilizes 
perspective^  consistent  two-dimensional  models  (prototypes)  of  views  of 
three-dimensional  objects,  and  interpretations  of  scene-representations 
are  based  on  the  establishment  of  mapping  relationships  from 
conglomerates  of  scene-elements  (line-constellations)  to  prototype 
templates.  The  prototypes  are  learned  by  the  program  through  analysis 
of  -  and  generalization  on  -  ideal  instances. 

A  small  hierarchy  of  features  (specific  line-  and  vertex  constei lat ions) 
is  used  in  providing  entry-points  (keys)  into  such  mappings,  since  an 
exhaustive  search  is  out  of  the  question  (for  reasons  of  combinatorics). 

Features  are  also  used  during  the  process  of  mapping  scene- elements  onto 
a  prototype,  serving  now  as  guides  and  templates. 

This  system  is  i ntermed , ate- 1 evel  in  the  sense  that  it  does  not  monk  on 
the  basis  of  the  original  rv-i„age  (but  on  information  abstracted  from 
it),  and  that  it  does  not  determine  (or  use)  spatial  dimensions, 
positions,  or  relationships  of  the  objects  in  a  given  scene. 


Its  place  in  an  extended  three-dimensional  system  is  discussed,  as  are 
some  possible  aspects  of  such  a  system. 

The  results  obtained  are  quite  good,  using  scenes  of  realistic 

complexity  and  with  many  examples  of  different  kinds  of  imperfections  in 
the  initial  data. 

In  conclusion,  the  system  works  better  than  any  sequential  (or  other) 
system  presented  so  far.  It  should  be  well  suited  to  the  context  of  a 
complete  vision  system,  using  depth,  occlusion,  support  relations,  etc. 
The  general  case  of  irregularly  shaped,  planar  faced  objects,  including 
concave  ones,  would  necessitate  such  an  extended  context. 
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1.0 


1.0  INTRODUCTION 

In  the  context  of  the  present  paper,  the  term  "computer  vision"  is 
restricted  in  scope  to  a  world  of  planar-faced  solids,  notably 
para  I  It  I epipeds,  wedges,  and  other  simple  objects  that  may  be  expected 
to  be  useful  in  a  "Hand-Eye"  context,  for  instance  as  building  blocks. 

This  paoer  deals  more  or  less  exclusively  with  vision  at  an  intermediate 
level,  viz.  our  "input"  is  an  array  of  "brightness-discontinuity  points" 
(over  a  digitized  TV-raster  representing  cur  view  of  the  scene).  Our 
"output"  is  a  formalized  interpretation  of  that  information  as  a  two- 
dimensional  representation  of  a  three-dimensional  scene. 


Statement  of  the  probiem: 

On  a  table  is  a  collection  of  blocks.  "Looking"  at  that  scene  is  a  TV- 
camera.  which  is  linked  to  the  computer,  so  that  the  latter  may  obtain  a 
digitized  raster  of  the  image.  The  problem,  of  course,  is  to  program 
the  computer  so  as  to  enable  it  to  output  an  interpretation  of  that  TV- 
image,  in  terms  of  the  nature  and  relationships  of  the  objects  in  the 
scene.  This  interpretation  may  then  be  the  final  product  in  itself,  or 

it  may  be  used  by  other  programs  for  purposes  of  manipulating  the 
objects  in  the  scene. 


At  this  early  point  I  suggest  that  the  reader  take  a  look  at  at  some  of 

the  examples  in  Section  11,  in  order  to  get  a  more  concrete  idea  of  what 
this  is  all  about. 
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This  paper  first  touches  on  some  related  efforts  toward  solving  the 
vision  problem,  and  on  the  pros  and  cons  of  sequential  (abstraction) 
versus  model  driven  schemes.  Its  main  body  presents  a  system  based  on 
prototypes  and  features,  which  uses  global  knowledge  and  goa  I -d i rect  i on 
to  a  higher  extent  than  nas  previously  been  tried. 


The  basic  lay-out  of  the  present  paper  is  the  following: 

Immediately  after  this  introduction  you  will  find  a  list  of  commonly 
used  terms  and  abbreviations  (Section  2).  I  recommend  a  quick  scan 
through  that  list  before  reading  the  rest  of  this  presentation,  but  the 
main  use  of  the  list  should  be  for  easy  reference  whenever  unfamiliar 
terms  are  encountered  throughout  this  paper. 

Section  3  deals  with  previous  and  related  efforts  in  this  area  of 
interest,  and  contains  a  discussion  of  the  difficulties  inherent  in 
"sequential-abstraction"  methods,  contrasting  that  approach  with 
"global-knowledge"  schemes,  particularly  model-based  ones. 

Section  4  through  Section  12,  the  main  part  of  this  presentation, 
describe  a  model-based,  feature-driven,  intermedi ate- I  eve  I  vision 
system.  Examples  of  system  performance  are  provid’d,  as  well  as  a 
discussion  of  future  possibilities,  and  the  usual  conclusions.  There  is 
also  an  appendix  containing  a  description  of  the  general  data-structure. 

The  main  part  of  the  thesis  is  presented  according  to  the  following 
p  I  an: 


1.0 


First  an  overview  of  general  considerations  and  basic  strategies, 

Section  4.  Then  a  thorough  discussion  of  the  feature  hierarchy 

(definitions  -  properties  -  utilization),  Section  5.  Section  G  then 

introduces  the  prototype  concept,  in  like  manner  and  considerable 
detai I . 

Having  .has  established  the  conceptual  machinery.  ue  then  embark  on  a 
description  of  the  process  of  interpreting  the  scene. 

Preprocessing  is  dealt  pith  in  Section  7.  The  parsing  process  (uhich 
utilizes  the  prototype  matching  program  in  interpreting  the  scene)  is 
presented  in  Section  8.  The  matching  process  is  given  in  the  follouin 
section,  uhich  is  rather  technical  and  relies  on  a  hierarchy  of  block- 
diagrams  for  the  presentation  of  the  flow  of  process. 

■  have  foond  i,  difficult  to  give  a  transparent  account  of  the  ma.chin, 
program,  and  I  ask  the  reader’s  indulgence,  should  she/he  find  the 
presentation  hard  to  absorb  at  first  glance. 

Section  10  discusses  a  possible  object  completion  phase.  This  is 
followed  by  examples  of  system  performance  (Section  11),  and  discussion 
of  future  possibilities  (Section  12).  The  last  three  sections  are  (in 
order  of  appearance!:  Conclusions.  Appendix,  and  Bibl iography. 

For  subdivisions  of  the  above,  see  table  of  contents. 
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2.0  COMMON  DEFINITIONS  AND  ABBREVIATIONS 

BARE  (vertex):  A  vertex  consisting  of  a  single  line-end,  possibly  before 
insertions  of  extra  lines. 

BASE-LINE:  See  PARENT  LINE. 

CCU.:  Coun+er  clockwise. 

CF:  Compound  feature  (Section  5). 

COMPLEXITY:  The  number  of  lines  involved  (in  a  vertex,  a  feature,  or  a 
prototype) . 

CONNECTED:  See  SIMPLY  CONNECTED. 

CONSTELLATION:  Usually  the  group  of  lines  referenced  by  a  vertex  or  a 
feature  (should  be  clear  in  each  context). 

CONVEX  (object):  An  object  where  any  line  connecting  two  points  of  the 
surface  lies  entirely  inside  the  object  or  on  its  surface. 

CUT:  A  line  (in  the  drawing)  chopping  a  small  piece  off  another  line,  in 
the  formation  of  a  vertex, 

EQUALITY  CLASS:  Collective  term  for  length  class  and  par a  I  lei i ty  class. 
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EQUIVALENCE  CLASS:  Pertaining  to  LF:s  in  a  prototype  (Subsection  G.3). 

ILV-SYSTEM:  Intermediate-Level  Vi sion  System. 

LENGTH  CLASS:  Pertaining  to  lines  in  a  prototype  (Subsection  6.4). 

LF:  Line-feature  (Section  5). 

N00EL  (and  derivatives):  Alluding  to  a  concrete  pattern  (prototype)  for 
matching,  or  an  abstract,  driving  concept,  such  as  the  idea  of 
"object"  o-  "well-shaped  region".  Connected  with  the  use  of 
global  contents  (cf.  Subsection  3.3).  Sometimes  "model"  is 
used  interchangeably  with  "prototype". 

OBJECT:  Usually  a  physical  entity,  such  as  a  block  on  the  table.  Also 
used  for  the  perceptual  entity  of  an  internal  object 
repr  esentat i on. 

CRBIT:  Orbiting  a  vertex  means  cycling  around  it  in  a  ecu.  direction, 
visiting  the  lines  one  by  one,  from  some  given  starting  line. 

ORBITAL  DISTANCE:  The  number  of  lines  from  (excluding)  a  given  line  up 
to  (and  including)  another,  in  a  ecu.  direction  around  a  vertex. 

PARALLELITY  CLASS:  Pertaining  to  lines  in  a  prototype  (Subsection  6.4). 

PARENT  LINE(S)  (of  a  feature):  The  line  (or  lines)  which  is  (are) 
partaking  fully  (i.e.  with  both  ends)  in  the  feature. 
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PARTIAL:  Short  for  partially  matched  object. 
P-LINE:  Prototype  line. 


RAV,  A  I  ine-segment  ui.h  only  one  end  and  a  direction  given  or  currently 


re  ferenced 


SCENE:  A  collection  of  real-world  objects,  or  the  internal 
representation  thereof. 

SEQUENTIAL  (and  derivatives):  Usually  referring  to  the  idea  of 

"sequential  abstractions"  (cf.  NOOEL  and  Subsection  3.3).  Not 
used  as  opposite  of  "parallel"  (processing). 

SIHPLV  CONNECTED:  Two  vertices  are  simply  connected  iff  they  have  a  I  i, 
in  common,  two  non-parallel  lines  iff  they  share  a  vertex,  two 
parallel  lines  iff  they  have  a  connecting  line. 

SUCCESSOR  LINE:  The  line  following  a  given  line,  in  the  orbit  of  a 
vertex. 


TOPOLOGY:  Besides  its  usual  fearing,  it  is  so.eti.es  used  in  conjunction 
Uith  features,  namely  with  their  normal  context  in  mind  (as 
parts  of  complete  topologies). 

TRIHEDRAL:  Sometimes  short  for  TRIHEDRAL  OBJECT. 
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TRIHEDRAL  OBJECT:  A  planar  faced  object  with  TRIHEDRAL  VERTICES  only 
TRIHEDRAL  VERTEX:  A  vertex  where  exactly  three  surfaces  meet. 


3.0 


3.0  APPROACHES  TO  THE  VISION  PROBLEM 

In  most  fields  of  science  the  established  pattern  of  research  has  been 
the  selection  and  investigation  of  subproblems,  rather  than  broad 
frontal  attacks  on  complex  systems.  Of  course,  practical, ty  oftentimes 
prompts  such  polices,  and  mostly  the  results  achieved  are  relevant  to 
the  understanding  or  function  of  the  whole. 

I  am  not  quite  certain  whether  research  in  computer  vision  fits  into 
such  a  pattern. 

During  the  last  decade,  many  man-years  have  been  spent  on  investigating 
problems  conceived  as  relevant  parts  of  some  nebulous  whole.  To  the 
extent  that  we  have  gathered  understanding  of  the  difficulties  inherent 
m  vision,  the  results  have  certainly  been  relevant.  Whether  they  are 
applicable  in  the  context  of  future,  complete  vision  systems,  is  a 
di fferent  consideration. 

Research  in  this  field  has  been  more  or  less  confined  to  an  idealized 
world  of  objects  whose  surfaces  are  all  planar.  The  rationale  behind 
this  is  twofold.  Such  objects  are  comparatively  easy  to  represent  in  a 
computer ,  and  many  every-day  manipulatory  tasks,  interesting  from  the 
standpoint  of  Artificial  Intelligence,  involve  such  cojects. 

The  implicit  assumption  has  been  that  from  this  kind  of  first 

approximation  to  computer  vision  we  should  be  able  to  build  more 
generally  applicable  systems 


9 


3.0 


Such  assumptions  are  dangerous,  in  my  opinion.  The  confinement  to 
planar  faced  objects  has  invited  all  kinds  of  klugery  and  special-case 
analysis  that  is  completely  irrelevant  to  perception  of  more  general 
objects. 

Many  people  have  elected  to  further  limit  the  scope  of  their  research  to 
the  segmentation  of  ideal  line-representations  of  scenes,  in  terms  of 
their  constituent  object-  i  nterpretat  i  ons.  While  that  subproblem  is  by 
no  means  trivial,  and  certainly  elucidating  in  its  own  right,  it  would 
seem  not  immediately  relevant  to  the  intensely  practical  realities  of 
computer  vision,  even  in  the  restricted  context. 

i he  seeming  simplicity  of  the  subproblem  (dealing  with  planar  faced 
objects)  has  seduced  us  into  attempting  solutions  with  limited 
machinery,  using  restrictive  assumptions  and  special-case  heuristics.  I 
think  this  is  unfortunate,  but  maybe  a  "necessary"  way  to  develop  this 
young  field  of  research. 

Vision  IS  a  hard  problem.  I  guess  we  have  al!  learned  from  the  lack  of 
spectacular  results  so  far. 

This  section  deals  with  related  history  in  computer  vision,  describes 
the  mad  quest  for  the  Perfect  Line-drawing  (alias  Pimpernel),  and 
discusses  hierarchical  (local  decision)  versus  model  driven  schemes. 
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3.1  BRIEF  PERSPECTIVE  ON  RELATED  EFFORTS 

"  •  •  .  I s  He  i n  Heaven  .  (?) .  I s  He  i n  He  1 1  .  (?) . 
That  damned  elusive  Pimpernel?!" 

[Roberts  1983] : 


For  current  purposes  the  history  of  computer  vision  starts  pith  Roberts. 
His  uork  covered  the  complete  spectrum,  fro.  camera  output  to  three- 
dimensional  interpretation,  and  is  in  that  sense  a  unique  effort.  Most 
other  uork  in  computer  vision,  so  far,  has  dealt  ui  th  subsyste.s.  But 
even  Roberts  paid  scan,  attention  to  the  pre-processing  stages  of  hi, 

system,  concentrating  on  the  aspects  of  handling  representations  of 
three-dimensional  objects. 


Using  a  facsi.i  le  scanner  on  a  photograph  o,  the  scene  to  provide 
Picture  input,  he  then  deploys  so.e  fairly  si. pie  heuristics  to  abstract 
a  connected  line-drauing  from  the  original  raster.  The  progra. 
subsequently  finds  all  del  I -shaped  regions,  and  attempts  to  »atch 
constellations  of  such  regions  uith  similar  constellations  recorded  for 
the  three-dimensional  modeis.  This  is  performed  in  a  series  of  steps, 
each  one  performed  uhen  the  previous  one  yields  no  results,  and  each  one 
requiring  less  information  than  the  previous  one: 

1.  Using  regions  around  a  vertex. 

2.  Using  regions  surrounding  a  line. 

3.  Using  a  region  and  a  third  line  from  one  vertex. 

A.  Using  a  three-line  vertex. 
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Note  that  step  4  signifies  a  liberation  from  the  requirement  for  well- 
shaped  regions,  but  since  4  points  are  enough  to  determine  a  perfect 
partial  projection  of  any  of  his  models,  this  easily  leads  to  forbidding 
combinatorics  and  nonsensical  interprets  ions  in  non-trivial  scenes. 

A  fixed  set  of  3  models  (parallelepiped,  wedge,  hexagonal  prism)  is  used 
and,  given  the  Key  match,  the  picture  and  model  points  are  "cycled 
around"  in  order  to  "line  up  the  order  of  the  polygons".  If  the  orders 
can  be  matched,  a  list  of  equivalent  point-pairs  is  created,  the 
transformation  from  the  3D  model  to  the  scene  representat ion  is 
computed,  as  well  as  the  error  of  fit.  If  the  match  is  acceptable  the 
lines  belonging  to  the  object  projection  are  removed  from  the  scene,  and 
the  process  iterates. 

The  treatment  of  composite  objects,  or  rather,  the  interpretat ion  of 
complex  objects  as  conglomerates  of  instances  of  his  simple  models,  is 
of  particular  interest.  Uhen  a  picture  polygon  is  divided  during  the 
process  of  back-projecting  an  object,  lines  inside  that  region  -  and 
belonging  to  that  object  -  are  inserted,  and  that  fact  is  remembered  so 
that  a  linkage  of  the  parts  of  a  composite  object  may  be  obtained.  Such 
objects  may  then  be  back-projected  in  any  position  and  under  any 
rotat i on. 

The  models  are  not  fixed  as  to  size,  so  that  any  right-angle 
parallelepiped  uould  match  his  "cube" -mode I ,  for  instance.  They  are 
fixed  as  to  skew,  however. 
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To  sum  up.  Roberts  work  constitutes  an  important  initial  effort.  His 
program  worked  on  very  simple  scenes  under  ideal  conditions.  The 
preprocessing  heuristics  are  not  sophisticated  enough  to  handle  complex 
scenes,  and  the  matching  program  seems  highly  dependent  on  perfect  line- 
drawings.  The  program  also  seems  dependent  on  the  fixed  set  of  models, 
so  that  the  incorporation  of  a  new  model  would  require  program  changes. 
This  drawback  is  somewhat  offset  by  allowing  for  composite  objects.  The 

treatment  of  such  objects,  and  the  three-dimensional  manipulations,  are 
particularly  interesting. 


Quoting  Roberts:  "The  biggest  benefit  of  this  investigation,  however,  is 
an  increased  understanding  of  the  possible  processes  of  visual 
percept  ion. " 


[Guzman  1368] : 


The  major  contribution  of  Guzman  was  the  demonstration  that  -  for  quite 
complex  scenes  -  assuming  essentially  perfect  line-drawings  without 
shadows  or  other  irregularities  -  ont  may  very  often  infer 
interpretations  in  terms  of  projections  of  three-dimensional  objects 
lucdy  segmentation),  using  a  rather  limited  set  of  chiefly  local 
heuristics  based  on  the  properties  of  the  vertices  in  the  scenes. 

Using  such  heuristics,  weak  or  strong  links  (depending  on  the  vertices 
in  question)  may  be  established  between  regions  across  lines.  Regions 
are  subsequently  grouped  together  into  "nuclei"  according  to  the  nature 
of  such  links,  and  the  final  nuclei  constitute  the  body  interpretations. 
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Little  global  context  is  used,  and  that  is  provided  in  a  limited  way  by 
mechanisms  for  link  inhibition  in  some  contexts  of  T-joints,  and  for 
link  creation  in  cases  of  matching  T :  s  (continuing,  obstructed  objects). 

A  body  may  sometimes  oe  correctly  identified  even  though  an  interior 
line  be  missing,  provided  the  resulting  regions  get  linked  together 
strongly  enough. 

By  the  same  token  several  objects  may  be  clumped  together  into  one,  due 
to  missing  exterior  lines. 

Basing  segmentation  on  the  formation  of  links  between  regions  makes  this 
program  very  sensitive  to  imperfections. 

Thus  Guzman’s  program  is  dependent  on  a  very  clever  preprocessor, 
unrealistically  clever,  in  fact.  I  shall  get  back  to  that  subject  later 
in  this  section,  but  let  us  just  list  a  few  things  such  a  preprocessor 
is  supposed  to  be  able  to  do.  it  must  produce  an  essentially  perfect 
line-drawing,  that  is,  eliminate  lines  caused  by  shadow  effects  (as  veil 
as  other  spurious  lines).  It  must  also  insert  missing  lines,  and  group 
together  lines  into  vertices  correctly,  since  the  proper  links  would 
otherwise  not  be  formed.  No  small  task! 

The  results  of  Guzman’s  work  seemed  impressive.  His  program 
successfully  analyzed  very  complex,  carefully  constructed  scenes.  One 
+  ends  to  forget  that  these  scenes  are  not  "live",  and  that  the  results 
are  in  that  sense  not  immediately  and  implicitly  relevant  to  the 
practical  problem  of  computer  vision. 
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[Hinston  1970): 

Base,  on  Perfect  (synthetic)  I ine-drauings  an,  negion  analysis,  Cuban's 
progra.,  uas  used  by  Uinston  in  a  system  that  analyzes  scenes 
structurally,  and  learns  structural  descriptions  frou  examples. 

Since  ..uzman  several  people  have  chosen  to  UOrk  on  the  SMe  8ubprob|en, 

na.e,y  (he  interpretation  o,  an  essentially  parted  line-drauing  ,ulth 
or  u.thout  shadou  lines)  as  a  conglomerate  of  objects. 

tOrban  1970]: 

Urban  provided  some  shadon-e I  iminat  ing  heuristics  that  could  be  used  in 

conjunct  ion  pith  Guzman's  program,  in  a  preprocessing  stage.  Those 

heur i s 1 1 cs  mere  local  ,n  nature  and  based  on  the  observation  that  joints 

caused  by  shadous  often  are  X:s  and  T: s.  and  that  such  joints  are  often 
cha inwise  linked  together. 

[Huffman  19691  &  [Clowes  1971]; 


These  tuo  authors  devised  labelling  schemes  to  catalogue  the  possible 
interpretations  of  vertices  that  may  be  found  in  perfect,  shadopless 
l.ne-dra, tings  o,  scenes  of  trihedral  objects.  Such  labellings  may  serve 
to  provide  more  global  contexts  for  segmentation  processes. 

Such  and  related  concepts  uere  utilized  by  Falk  and  expanded  by  Ualtz. 
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[Falk  19701s 

In  the  context  of  the  Stanford  Hand-Eye  Project  [Feldman  et  al  19891, 
Falk  embarked  or  the  development  of  programs  to  "interpret  imperfect 
line-drawings  as  three-dimensional  scenes". 

Utilizing  a  vertex  labelling  scheme  related  to  Huffman’s  ideas,  Falk 
devised  heuristics  for  body  separation,  which  work  for  more  general 
scenes  than  those  of  Guzman,  in  asmuch  as  some  cases  of  missing  lines, 

or  parts  of  lines  (at  object  intersections),  do  not  cause  the  program  to 
make  erroneous  decisions. 

After  body  separation,  such  lines  may  subsequently  be  detected  and 
inserted.  The  program  uses  the  vertex  labels  to  form  links  between  the 
lines  in  the  drawing,  and  the  bodies  are  defined  in  terms  of  such  links. 
The  assignment  of  regions  to  objects,  and  possibly  of  dividing  regions 
betueen  objects,  is  a  secondary  problem. 

Note  that  ba  ing  segmentation  on  links  among  lines,  rather  than  between 
regions,  makes  this  approach  less  error  sensitive  than  Guzman’3. 

Following  body  segmentation,  some  simple  heuristics  are  used  in 
determining  occlusion  relations  over  the  scene,  and  the  extracted  bodies 
(which  may  be  partially  occluded)  are  completed  as  far  as  possible, 
based  on  co I  I  i near i t i es  and  extension-vertices. 

Base  edges  are  then  found,  and  support  relations  are  inferred.  Such 
data  is  used  in  the  determination  of  locations  in  space,  below. 
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The  recognition  part  of  the  system  works  with  a  programmed  set  of  fixed 
size  models,  for  which  the  numbers  of  faces  and  vertices  are  stored  for 
each  different  view,  along  with  the  number  of  sides  for  each  visible 
face.  Such  properties  are  compared  with  those  of  bodies  in  the  scene, 
giving  a  certain  list  of  possible  matches  for  each  body.  Secondarily 
the  nature  of  the  ,egions  is  used  in  order  to  reduce  such  lists,  and  ihe 
final  choice  is  based  on  physical  properties  -  lengths  and  angles  - 
uhich  are  computed  from  the  monocular  view,  using  hypotheses  of  ground 
Plane  or  object  support.  The  use  of  objects  supporting  objects  (flat  on 
top)  represents  an  extension  of  Roberts  work,  which  only  used  the  ground 

Plane  assumption.  If  an  object  cannot  be  recognized,  a  second  attempt 
is  made,  using  relaxed  parameters. 

The  identities  and  locations  in  space  of  th.  recognized  objects  are  non 
known,  and  the  objects  nag  be  back-projected  and  conpared  with  the 
original  line-drawing,  Techniques  akin  to  those  of  Roberts  are  used 
here,  including  a  fairly  single  hidden-line  eliminator.  The 
correspondence  of  the  original  and  projected  drawing  is  evaluated,  line 
for  line,  based  on  some  parametric  tolerances  regarding  the  nu.ber  and 
nature  as  well  as  the  closeness  of  coincidences  between  original  and 
back-projected  I  ines. 


Falk-s  program  is  related  to  both  Roberts’  (the  use  of  models)  and 
Guzman’s  (the  implementation  of  body  segmentation).  The  combination  of 
those  techniques  is  an  inte-esting  idea.  The  program  is  somewhat  less 
error  sensitive  and  more  practically  useful  than  Guzman’s,  and  it  has 
been  successfully  demonstrated  on  live  data  (using  a  preprocessor  coded 


17 


r 


•^r 


o  I 

by  yours  truly).  Houever.  due  to  the  way  segmentation  is  i n p I emented  - 
unrelated  to  recognition  -  it  shares  the  weakness  of  Guzman's  program 
(to  a  great  extent)  of  being  unable  to  cope  with  realistically  imperfect 
line-drawings.  Like  that  program,  it  is  shadow  and  noise  sensitive. 

(Waltz  1372): 

The  concepts  of  vertex  labelling  introduced  by  Huffman  have  been 
extended  by  Waltz,  who  not  only  handles,  but  actually  also  utilizes, 
shadows  in  the  process  of  segmenting  the  scene.  Here,  too,  the  original 
data  is  a  I ine-drauing,  which  is  assumed  essentially  perfect.  The 
following  is  a  brief  sk?tch  of  how  the  program  works. 

First  the  vertices  in  the  scene  are  labelled  according  to  all  their 
possible  interpretations,  given  that  the  scene  consists  of  convex, 
trihedral  objects.  Each  label  at  a  vertex  assigns  a  specific  label  to 
each  one  of  its  lines.  Such  line  labels  cover  most  of  the  possible  edge 
interpretations  in  a  three-dimensional  scene  (convex,  concave,  bounding, 
obscuring,  crack,  shadow),  and  they  also  cover  the  lighting  conditions 
on  the  sides. 

Waltz  now  applies  a  filtering  program  which  checks  the  inter-consistency 
of  the  two  sets  of  vertex  labels  for  each  line,  deleting  inconsistent 
labels  (i.e.  where  the  line  labels  could  not  agree).  This  filtering 
program  was  found  to  assign  unique  labels  in  a  surprisingly  large  number 
of  cases.  If  the  labelling  is  not  unique,  a  full  tree-search  for 
consistency  is  performed  over  the  entire  scene,  and  inconsistent  labels 
are  deleted. 
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The  resulting  labelling  deter.ines  the  segmentations)  of  the  scene  I 
hot  unique,  the  labelling  gives  rise  to  several  possible 

interpretations,  uhich  is  one  of  the  strong  points  of  the  prog™.  1, 
doesn  t  jump  to  conclusions. 

ualtz'  oork  contains  sore  of  the  more  amazing  case  ana.gsis  I  have  seen. 
The  speci  f  ic  I  ty  of  labellings  aids  in  the  seg.entat.on  process,  but  also 
»akes  the  prog™  sensitive  to  it. .perfections  in  the  I ine-drauing.  The 
progra.  handles  shades,  in  fact  categorizes  the,  as  such,  but  those 
have  to  be  consistent  as  nell  as  the  rest  of  the  I  ines,  other  uords 
ue  still  have  essential iy  the  require*,,  for  perfect  I i ne-drauings, 
though  this  time  with  (perfect)  shadows. 

So»e  facilities  for  dealing  uith  fissing  lines  are  included  in  ,he 

progra,.  Pore  preciselg,  the  case  of  a  .issing  interior  line  of  an 
object  is  treated,  si,plg  bg  i„clodi„g  such  special  case  t 

in  the  labelling  scheme. 

Ual.z’  progra,  is  so  far  the  .os,  elaborate  llne-drauing  seg.enter  in 

ex.stence.  i,  represents  a  radical  departure  fro,  earlier  (local- 

heuristic)  schemes,  in  that  the  entire  context  is  utilized  and  the  scene 

IS  interpreted  a,  a  uhole.  That  is  an  i,portant  achieve*,.,  i„  „y 
opinion. 


Houever,  in  order  to  be  practicallg  useful,  the  progra,  uould  require  an 
unrealistically  clever  preprocessor,  a  need  it  shares  uith  all 
segmenters  discussed  here  so  far.  I,  is  dependent  on  very  special  rules 
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regarding  the  labels  (shadows,  background,  etc.),  which  makes  it  error 
sensitive.  Furthermore  the  order  in  which  the  labelling  is  performed 
may  be  crucial  to  the  final  result. 

This  concludes  the  discussion  of  systems  based  on  the  assumption  of 
essentially  perfect  or  almost  perfect  line-drawings  (with  or  without 
shadows).  Comparatively  few  people  have  ventured  into  the  messy 
realities  of  live  scenes  -  fewer  have  emerged  with  anywhere  near 
spectacular  resul ts. 

Some  preprocessors: 

Visual  preprocessors  have  been  constructed  by  Binford  (Binford  1970], 
Brice  and  Fennema  (Brice  &  Fennema  1963],  Hueckel  (Hueckel  1971  &  1973] 
(edge-finder).  Pingle  [Pingle  &  Tenenbaum  1971],  Baumgart  (cf.  end  of 
this  subsection),  and  others. 

Tenenbaum  [Tenenbaum  1370]  fathered  a  substantial  thesis  on  accomodat ion 

in  vision,  including  work  on  edge  (line)  verification  and  depth  through 
variable  focus. 

Hy  own  experiences  in  the  field  of  endeavour  of  preprocessing  will  be 
d i scussed  short  I y. 

I  shall  here  first  briefly  deal  with  an  interesting  heterarchical 
approach  to  the  problem,  and  with  a  limited  system  using  learning  and 
recognition.  I  shall  also  mention  an  effort  regarding  vision  of  more 
general  objects,  and  an  approach  using  sequences  cf  views. 


20 


3.1 


(Shirai  1972]: 

Shirai  constructed  a  program  that  tries  to  find  bodies  in  a  scene, 
working  directly  on  the  digitized  picture,  and  utilizing  an  initially 

extracted,  perfect  contour  (the  background  is  black,  and  the  objects  are 
whi te) . 

The  process  is  heterarchical,  and  analyzes  the  data,  looking  for  lines 
and  vertices,  with  a  general  concept  of  "body"  as  a  guide.  It  utilizes 
the  information  it  has  already  obtained,  in  order  to  further  complete  an 
object.  This  is  in  contrast  to  hierarchical  schemes,  where  successively 
higher  abstractions  are  formed  more  or  less  in  sequence,  by  a  hierarchy 
of  heuristic  processes. 

More  specifically,  the  program  looks  for  lines  at  concave  junctions  in 
the  contour,  or  in  other  interesting  places.  Having  found  evidence  of  a 
line,  it  tracks  along  that  line,  looking  for  vertices  or  extensions, 
determining  the  implications  of  its  findings  as  far  as  the  concept  of 
object  goes.  The  global  context  (of  object)  enables  parameter  threshold 
adjustments  according  to  current  search  contexts. 

Shirai  s  program  is  an  interesting  and  promising  effort  toward  a 
heterarchical  vision  system.  The  idea  of  having  recourse  to  the 
original  intensity  information  throughout  the  process  of  segmentation  is 
a  good  one,  although  not  new,  of  course. 

The  program  does  not  work  in  the  presence  of  shadows  or  other 
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detrimental  effects,  and  is  not  general  enough  for  concave  objects. 
However,  in  simple  scenes,  under  ideal  lighting  conditions,  it  should  do 
an  adequate  job.  It  is  therefore  a  member  in  the  sparsely  populated 

class  of  practically  applicable  vision  systems.  So  is  the  final  related 
program  to  be  described  here. 

[Underwood  &  Coates  1972J : 

This  work  is  related  to  .ine,  in  that  it  uses  learning  and  recognition. 

It  is  a  United  vision  system,  working  only  with  single  objects,  which 
are  planar  faced  and  convex. 

Interactively  with  an  image  dissector  camera,  an  edge- I i ne-drau i ng  is 
obtained.  Regions  are  then  found  and  their  connectivity  investigated. 

During  the  learning  phase,  the  program  is  presented  with  views  showing 
all  the  different  surfaces  of  an  object,  and  it  is  able  tc  form  an 
internal,  complete  model  based  on  topology  and  on  certain  projection 
invariant  shape  measures  for  the  faces.  Those  topology  models  represent 
planar  unfoldings  of  the  objects,  except  for  actual  surface  sizes. 

Equipped  with  a  set  of  previously  learned  models,  the  program  is 
subsequently  able  to  match  any  view  of  such  an  object  with  one  or  more 
of  the  models,  using  the  topology  and  the  number  of  sides  of  each 
surface  in  view.  If  there  are  several  matches,  the  shape  measures  are 

compared  in  order  to  form  probabilistic  estimates  of  the  closeness  of 
fit. 
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As  system.  i.  >•  ,o  senes  o,  sing,e  onsets. 

comp  I  ete  ,n  the  sense  that  i,  goes  aM  the  uay  from  camera  input  t0 
recogn i t ion.  The  preprocessing  phase  of  «he  program  is  required  to 
~  per  line-drauings,  otheruise  the  recogni , ion  Cor  teaming, 

will  not  work. 

The  idea  o.  "learning  by  looking-  is  an  appealing  one. 

(Agin  1972]: 


Agin  has  provided  some  interesting  initial  uork  „n  ,he  representation  , 

curved  ob,ects.  and  the  use  of  laser  ranging  to  obtain  depth 
information. 

a  curved  object  is  represented  through  its  ,upic3l  cross-sections  aiong 
o  main  axis.  The  laser  is  utilized  in  mapping  out  the  surface 
cunvature.  As  mentioned.  Agin  a, so  notes  the  possibie  use  of  a  iaser  „ 
get  depth  information  in  the  context  o,  a  system  ,ike  the  one  presented 

'*  PaPer'  '  de,mi,el“  >»  »»*  as  a  good  idea,  as  indicated 

in  the  section  on  future  work  (Section  12). 

(Baumgart  1974): 


-  s  system,  uhich  is  currentiy  under  development,  uses  sequences 
-us  „or  instance  by  rotation)  in  ana, going  scene..  Besides  being 
Ob  •  to  provide  30  information,  this  process  often  tends  to  neutralize 

’hB  e"eC'5  °'  Sha<'°US-  5l'1re’  ™ise'  8<b-  The  edge-finding  stage  of 
preprocessing  is  based  on  thresholding  and  merging. 


o»  i 


This  concludes  the  brief  outline  of  related  efforts 


in  computer  vision. 


3.2  OWN  EXPERIENCES  -  THE  NAD  QUEST 

"Nine  is  a  long  and  sad  tale..." 

As  ue  have  seen,  the  usual  first  step  in  interpreting  a  digitized  TV- 
raster  by  computer  (as  a  visual  scene)  has  been  to  condense  the 
information,  to  abstract  relevant  parts  of  it  and  thus  get-  a  smaller 

database,  and  one  that  is  more  conveniently  formatted  for  further 
ana  lysis. 

The  traditional  way  of  abstracting  and  condensing  such  information  is  to 
apply  a  brightness-discontinuity  detecting  operator  over  the  entire 
picture,  analyzing  a  small  fraction  of  it  at  a  time.  This  provides  a 
map  of  the  relevant  parts,  namely  where  the  brightness  changes  occur, 
and  where  we  may  hope  to  find  (for  instance)  outlines  of  objects. 

Roberts  used  this  approach,  followed  by  a  line-fit,  and  it  is  used  today 
(ten  years  later),  at  the  Stanford  Hand-Eye  Project.  Our  edge-operator, 

however,  is  a  much  more  powerful  one  (described  in  (Hueckel  1371  & 

1973)). 

The  initial  line-fit  further  reduces  the  database,  further  abstracts 
relevant  information,  and  further  renders  it  a  format  suitable  for 
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terpretat ion.  These  things  are  described  in  not  too 
Section  7. 


fnuch  detai  I  in 


He 


nou  come  to  a  somewhat  cruciul  point,  namely: 


What  is  the  general  nature  of  such  initial  line-drawings?  To  unat  extent 


can  we  expect  to  rely  on  their  information?  1 
should  be  there?  More? 


s  everything  there,  that 


Some  alternative  answers 
"Maybe  (?) ",  ... 


are:  11  Hm  "Ves! 11 .  "No!\  "(Mumble’..", 


•  »  s>«<  to  loose  quest i ons,  bu,  let  us  for  a  moment  assume  that 

the  ine-drauing  is  perfect.  ,n  such  a  case  there  are  no  problems.  Ue 

0rOUP  ,he  ""es  inl°  Ver,ices'  *>«*  O"  the  (small)  intersection 
distances,  detect  T-joints.  find  closed  regions.  Al,  of  oeco„es 

•tore  or  less  frivlal.  Houever.  the  tash  o,  interpreting  tne  iu.Pie  ot 

-eo.ons  lor  lines  and  vertices)  as  a  jumble  of  objects  is  non-tr  i  vial . 

We  have  seen  several  different  approaches  to  that  problem,  in  the 
preceding  subsection. 


'*  “**  """"  1  ^0  i"  this  field  a  feu  gears  bach 

to  use  existing  programs  as  building  blochs  for  a  sgsfe,  tha,  mould  „orh 

on  I ,ve  scenes.  The  Huechel  edge-finder  existed  in  a  pouertu,  enough 
incarnat.on,  Ue  had  a  copg  o.  Guzman's  program  (bo, dig  called  "SEE") 

I  »,  out  to  investigate  uhat  hinds  resuits  one  oould  obtain,  using 
the  edge-drauings  produce  the  highest  possible  gualitg  I  i„e-draui„gs 
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(mth  reasonable  effort),  and  then  entrusting  those 


to  'SEE"  for 


segmentation. 


AMlin9l“  en°USh'  'he  Pr°‘-"'a”  1  “P  hocked  reasonably  u,||  for 
sinple  scenes  IGrape  1969  and  1970).  By  then  i,  had  been  elaborated 
considerably. 


The  fol  loning  (able  briefly  describes  the  flop  o,  the  final  version  of 
that  preprocessor  (K  stands  for  Kluge): 

Kl.  Edge  detection. 


K2.  Initial 


ine-f i t. 


K3.  Formation  of  initial  vertices,  based  on  closeness 
edges. 


K4.  Formation  of  exhaustive  cross  reference  tables,  for  each 
line-end  listing  the  best  3  extension  intersections, 
blocking  lines,  possible  cuts,  nearest  col  I  inear  line. 

<5.  Using  that  cross  reference  table  to  form  secondary 
vertices  (iteratively,  in  a  parameter  relaxing  loop), 
using  brightness  information  as  an  additional  criterion. 

KG.  Grouping  of  initial,  secondary,  and  final  vertices  into 
final  vertices  (i.e.  iteratively),  using  di fferent 
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heuristics  according  to  appearances  of  the  constituent 
vertices.  This  necessitated  fairly  elaborate  heuristics 
to  prevent  the  line-drawing  from  self-destructing,  since 
moving  a  line  (to  accomodate  a  new  vertex)  might  cause 
secondary  movements  to  existing  vertices,  especial'y  in 
nection  with  T-joints.  It  was  solved  essentially  by 
allowing  the  line-drawing  to  float,  only  describing  the 
connectivity,  until  all  vertices  were  determined,  at  which 
time  their  constituent  lines  were  all  weighted  to  provide 
the  best  possible  vertex  coordinates. 

K7.  Finding  connected  paths,  outsides  and  insides. 

<8.  Determining  closed  regions. 

Line  prediction  and  ver  i f  i  cat  ion.  This  loop  uas  based  on 
criteria  for  we  I  I -shaped-ness  of  regions,  and  on 
Parallelogram  completion.  Predicted  lines  were  accepted 
or  rejected  on  the  basis  of  the  number  of  edge  points 
found  inside  an  elliptic  (or  sometimes  rhombic  or 
rectangular)  operator  of  parametric  width,  and  with  the 
predicted  line  as  main  axis. 

Kid.  Producing  the  resulting  line-drauing  in  the  for,at 
required  bg  "SEE"  (also  know,  as  "Guzaanizi  ng") . 


The  final  building  block  in  that  vision  sgste.  lat  that  tin) 
provided  by  "SEE". 


was  then 


3.2 


Note  .ha,  step  K9  constituted  .  step  auay  fro,  sequent  I  a  I  I  sm.  and  toward 
model. s,,.  The  model,  in  this  case,  was  the  concept  of  a  well-shaped 
region.  Note  also  the  interaction  tilth  the  original  edge-data. 

In  the  meantime  Fa  IK  wot.  his  Program  (described  above),  tor  which  mg 
preprocessor  uas  expected  to  provide  reliable  input.  The  prediction  - 
vent, rat, on  loop  uas  then  untor tuna.elg  no.  get  accessible  tor  common 
use.  and  consequentlg  FalK  had  more  trouble  than  necessary  |„  0p,aining 
scenes  on  uhich  to  demonstrate  his  program.  LucKilg,  ,g 
incorporated  facilities  for  editing  I  ine-drauings. 

"  ^  inCreaSin!"U  C,'ar  “  "  •-«  «*  Perfect  line-draumg  cduid 
no.  be  achieved  bg  a  preprocessor  based  on  local  heuristics.  Host 

people  in  vision  uorK  proPap.g  agree,  bg  nou.  ,  never  expeoted  to  be 
able  tb  provide  such  1  ine-drauings  in  general  (bg  a  long  shot),  but  the 
messmess  of  live  data  exceeded  mg  expectations.  Due  to  disturbances  in 
<he  scene,  as  well  as  harduare  glitches,  one  is  almost  aluags  faced  with 
ective  mit.al  data  (for  scenes  of  reasonable  complexity)  in  such  a 
uaw  as  to  mahe  locally  based  decisions  impossible.  Experience,  i, 
noth  cue.  has  demonstrated  the  need  tor  glpoa,  knou,edge  of  sonie 
form,  even  a,  the  intermediate  and  lou  levels  o,  computer  vision. 

The  next  subsection  is  an  attempt  to  analyze  that  need  tor  g,obal 

Knowledge.  Pcssiple  eoiutions  are  discussed,  particularly  ae  provided 
bg  the  dep logmen t  of  prototype-driven  schemes. 
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3.3  SEQUENT  I ALiSM  VERSUS  MODEL  ISM 


The  following  is  a  clarification  of  the  title  of  this  subsection. 

The  tern,  sequent  i  a  I  i  sn,  refers  to  the  common  method  of  sequentially 
finding  successively  higher  abstractions  (starting  with  the  digitized 
image,  and  possibly  ending  with  body-segmentation),  where  an  abstraction 
phase  cannot  be  repeated  at  the  request  of  some  higher- 1  eve  I  procedure, 
using  global  contexts.  Decisions  in  sequent i a  I  ism  are  often  of 
necessity  based  on  local  contexts. 

By  model  ism  I  here  refer  to  the  utilization  of  global  knowledge  at 
various  levels  (by  the  use  of  a  concrete  set  of  models,  or  an  abstract, 
driving  concept),  in  such  a  way  that  low-level  decisions  may  be  subject 
to  revision,  based  on  the  findings  of  higher-level  processes,  or  that 
such  knowledge  is  used  to  drive  those  stages  in  the  vision  process. 

Ue  have  noted  the  assumption  of  essentially  perfect  line-drawings  for 
several  vision  projects  described  previously.  Those  provide  examples  of 
sequent, al ism.  On  the  other  hand,  for  instance,  Shirai’s  program  works 
somewhat  in  the  spirit  of  model  ism,  inasmuch  as  it  interprets  the 
picture  with  the  concept  (model)  of  object  as  a  driver  of  the  process, 
and  actively  looks  for  objects  in  the  scene  from  the  beginning. 

The  idea  of  actively  looking  for  things,  based  on  various  clues  present 
m  a  tentative  mitia,  line-drawing  of  the  scene,  is  the  basic  principle 
behind  the  vision  system  Described  in  this  paper. 
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Let  me  nou  give  an  example  to  illustrate  the  difficulties  of  vision 
through  sequent ia I i sm. 


Figure  3.!  demonstrates  the  hazards  of  local,*  based  decisions,  in  ,he 
format, on  of  vertices  or  in  o.heruise  interpreting  scenes. 

Non  take  a  look  a,  Figure  3.2.  uhich  shops  the  complete  initial  ,;„e- 
drauing,  from  uhich  the  close-ups  in  the  previous  figure  are  excerpts. 

Being  human,  ue  understand  this  scene  verg  guicklg.  nou  lha,  ue  can  see 

311  of  It.  But  that  is  exactly  uhat  it  takes  here!  Not  necessarilg 

he  human.  Ou,  the  abilitg  see  global  relationships,  and  to  be  able  to 

interpret  those,  even  in  the  presence  of  spurious  data  and  the  absence 
Of  lines  that  should  have  been  there. 

The  principies  of  human  vision  are  not  necessarilg  something  ue  uan.  to 
■■Late,  in  order  to  create  a  computer  vision  sgstem.  Ue  simplg 
couldn',1  But  that  great  Master.  Evolution,  has  had  a  ,ong  time  at  his 

disposal,  and  ue  shouid  ,ear„  as  much  as  possible  fro,  our  oun  „ags  of 
visually  perceiving  the  world. 

I  think  ue  tend  to  see  the  uhole  before  the  details,  as  a  rule.  The 
examples  I  have  Jus.  given  certain!*  support  such  a  theorg.  Seeing  the 
global  relat.onsh.ps,  ue  are  able  to  correctly  interpret  or  classify  the 
events  in  the  partial  pictures.  Sequential  ism.  of  course,  attempts  to 
do  exactly  the  opposite,  namely  classify  the  local  relationships,  and 
from  them  somehow  to  infer  the  whole. 
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Figure  3.2 


Advantages  of  mode  I  ism 


I 

; 


MM 


mm 


In  my  opinion,  and  judging  from  my  own  experiences,  sequent i a  I , sm  is 

doomed  to  failure,  at  least  in  dealing  uith  realistically  complex  visual 

scenes.  The  concept  of  global  knowledge  is  crucial,  not  only  at  low  and 

intermediate  levels  (such  us  vertex  formation)  but  also  at  the  level  of 

three-dimensional  interpretation.  Here,  global  concepts  enter  in  the 

form  Of  support  theory,  understanding  of  depth  and  occlusion  relations, 

etc.  Ideally,  I  think,  this  should  also  interact  down  to  the  lower 
I  eve  I s. 


I  am  now  about  to  embark  on  the  main  purpose  of  this  paper,  the 
presentation  of  a  vision  system  in  the  model istic  spirit.  It  learns  its 
prototypes,  and  the  understanding  of  the  scenes  is  based  on  recognition. 
It  tends  to  see  global  structures  in  somewhat  the  same  way  I  do,  and  is 
therefore  relatively  insensitive  to  imperfections  in  the  scene 
representations. 


4.0  STRATEGY  OVERVIEW 


4.1  GENERALITIES 

It  should  be  clear  by  now  that  the  purpose  of  the  present  intermediate- 
level  vision  system  is  not  to  produce  a  "perfect  line-drawing",  to  be 
further  processed  by  "higher-level"  programs.  The  perfect(ed)  line¬ 
drawing  rather  has  the  character  of  an  optional  by-product,  which  is 
nice  to  display  as  a  demonstration  perhaps,  but  which  is  not  necessary 
for  the  purpose  of  the  computer  "understanding"  and/or  being  able  to 
manipulate  the  scene. 

The  purpose  of  the  system  presented  here  is  to  parse  the  scene. 

'parsing"  being  defined  as  determining  the  nature  and  location  of  the 
partaking  objects  as  expressed  in  their  two-dimensional  projections.  It 
leaves  the  aspects  of  three-dimensional  positions  and  relations  to  a 
higher-level  program,  which  is  as  yet  non-existent  as  a  whole,  but  for 
uhich  parts  of  Falk’s  work  may  be  adapted  (cf.  Subsection  3.1).  The 

development  of  such  a  system  is  currently  under  way  at  the  Stanford 
Hand-Eye  Project. 

1  shall  only  briefly  discuss  the  role  and  reason  for  the  dichotomy  into 
different  levels  of  models,  the  main  discussion  having  been  presented  in 
Subsection  3.3.  The  ILV-system  proposed  in  this  paper  uses  2D 
prototypes,  which  are  perspect i ve I y  consistent  projections  of  the 
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Preceding  page  blank 


4.1 


different  vieus  of  their  parent  opjeots.  This  .ethod  has  the  advantages 
of  simplicity  i„  representation,  ease  in  feature-extraction  and 
convenience  in  mapping.  Basically  this  system  is  an  experimental  subset 
of  a  possible,  more  extensive  system  based  on  3D  models.  From  such 
models  the  20  prototypes  might  easily  be  extracted  through  systematic 
projections.  Some  aspects  of  an  extended  system  are  treated  in  Section 
12.  A  detailed  account  of  the  20  models  is  to  be  found  in  Section  S. 

The  present  system  runs  a  complete  parse  on  the  entire  scene,  stopping 
only  uhen  the  scene  is  exhausted.  One  good  reason  for  this  behaviour  is 
that  there  is  nothing  else  for  it  to  do,  since  the  higher  level  (301 
package  does  not  exist.  Given  the  presence  of  such  a  higher  level 
program,  it  may  prove  desirable  to  drive  the  parse  from  that  extended 
system,  with  full  utilization  of  the  concept  of  30  and  uith  the 
necessary  support  theorems,  etc.,  checking  each  mapping  individually. 

The  concept  of  generality  has  been  of  considerable  importance  to  the 
author  during  the  course  of  this  undertaking,  and  the  present  20  system 
does  (uithin  its  scope!  have  that  desirable  property.  Input,  analysis, 
and  learning  of  prototypes  is  fully  automatic.  There  is  no  special  case 
analysis  (uith  the  exception  of  perspectively  degenerate  vieus)  other 
than  that  uhich  is  implicit  in  the  structuring  of  the  feature  hierarchy 
(Section  6),  uhich  influences  the  prediction  -  verification  elements  of 
the  matching  program  (Section  9).  Perspectively  degenerate  vieus 
(Subsection  6.8)  have  required  some  degree  of  special  treatment. 

Since  it  is  out  of  the  question  to  exhaust  all  mapping  possibilities 
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between  the  scene  and  the  prototypes  on  a  random  basis,  the  utilization 

of  easily  extracted,  easily  mapped  and  recognized  features  becomes 

operative.  The  features  serve  as  Keys  for  the  matching  process,  and 

-e  also  used  throughout  that  process  for  purposes  of  prediction  and 
ver i f icat ion. 


The  overall  structure  of  the  syste,  is  built  up  in  the  fol 
1.  Preprocessing  (Section  7). 


lowing  way: 


<?•  Parsing  (Sect 


i  on  b  ) . 


(3.)  Object  completion  (Section  10). 

The  third  of  those  phases  is  an  un implemented  possibility.  Each  o, 
these  processes  displays  some  non-se„uential  behaviour,  notably  the 
parser,  uhich  uses  a  non-secuential  (recursive)  .etching  program.  The 
tern,  "non-sepuen, ia, ■  is  used  here  to  stress  the,  the  process  is 
different  from  that  of  "sequential  abstractions". 

nany  examples  of  the  proceedings  of  this  system  are  provided  in  Secio 
11.  and  I  Strongly  suggest  that  you  take  a  preliminary  look  a,  those 

n0“'  be,°re  COn,i"Ui"9  ,he  <“>'«•  are  an  expert,  and  knot, 

“h3‘  'S  30i"9  ThiS  Sh0Ula  *•»  -s,  o,  the  presentation  easi, 
to  foil ow. 
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4.2  STRATEGIES 


More  precisely  the  strategy  is  as  follows: 


1.  Fit 


it  lines  to  initial  edge-data,  iteratively  and 
conservat i vely. 


Parse  the  resulting  line-drawing,  in  a  looping 
process,  each  time  finding  the  best  possible  match 
between  elements  of  the  scene  and  some  prototype, 
isolating  that  mapping,  modifying  the  scene  (by 
removing  the  lines),  and  iterating.  This  process  ends 
when  there  are  no  more  possible  mappings. 


Bring  isolated,  incomplete  mappings  (parts  of 
prototypes)  back  into  the  scene,  one  by  one,  in  order 
of  decreasing  complexity.  Then  investigate  for 
possible  extensions  of  the  mappings  (taken  one  by 
one).  This  program  could  use  the  same  principles  as 
(and  indeed  parts  of)  the  mapping  program. 


The  (hue  obtained  interpretation  of  the  I  ine-dratting,  as  a  2D  projection 
of  a  3D  scene,  consists  of  a  set  of  disjoint,  possibly  only  partially 
mapped,  prototype  instances. 


At  this  stage  it  wold  be  possible  to  further  investigate  the  union  of 
these  instances,  comparing  uith  the  initial  data,  thus  determining  „0st 
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of  the  occlusion  relationships  between  the  different  objects.  Ue  would 
then  quite  often  be  able  to  obtain  a  corrected  (I  refrain  from  saying 
"perfect")  line-drawing.  The  reason  I  have  abstained  from  implementing 
such  heuristics  is  simply  that  the  program  would  not  be  dependable 
enough,  it  would  be  somewhat  klugy  in  nature,  and  it  would  be  uncalled 
for  m  the  content  of  an  extended  system,  where  such  relationships  (as 
ment  earlier)  would  be  much  more  elegantly  and  soundly  determined 

on  the  basis  of  positions  in  space,  support  relations,  etc. 

For  the  present  the  parsing  program  is  non-recurs i ve,  i.e.  it  accepts 
the  currently  best  match  between  scene  and  prototypes,  amends  the  scene 
accordingly,  and  then  carries  on  in  the  same  style  with  the  amended 
scene.  Another  possibility  might  be  to  introduce  recursion  at  that 
level  as  well  (as  the  matching  level),  thus  keeping  a  number  of 
different  alternative  parses  around,  among  which  a  most  likely  candidate 
may  be  chosen  (verified).  The  combinatorics,  however,  would  seem  rather 
forbidding,  as  would  storage  requirements.  Furthermore  there  is  no 

sufficiently  established  need  (yet)  to  warrant  an  effort  in  that 
d i rec  t i on. 

From  this  strategy  overview  we  now  turn  to  more  detailed  accounts  of  the 
component  parts  of  the  system. 
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5.0  FEATURES 


5.1  INTRODUCTORV  EXAMPLES 

The  use  of  features  to  provide  mapping  clues  (and  matching  guides)  from 

scene-elements  to  prototype  elements  is  essential  to  the  system 

presented  here.  I  shall  consequently  deal  with  these  concepts  in  some 
deta  i  I . 


The  general  idea  behind  this  system  is  one  of  recognizing  elements 
encountered  before,  as  parts  of  familiar  things  (objects).  After  such 
"first  impressions"  the  system  proceeds  to  verify  or  refute  its  initial 
theory  regarding  the  identity  of  the  object.  The  instruments  of  "first 
impressions  3re  csl led  Mfe3turesM« 


The  feature  hierarchy  is  based  on  the  (personal)  observation  that  ue  get 
strong  visual  clues  (in  a  20  image)  from  the  way  in  uhich  side  regions 
(face  projections)  of  objects  come  together  (information  that  we  use 
mth  the  shape  of  the  regions  in  order  to  make  sense  of  objects). 


Therefore  the  features  have  been  constructed  to  contain  extensive 
information  about  region  junctions,  i.e.  the  lines  (including  end-vert 
constellations)  of  two-dimensional  projections  (object  or  prototype). 
While  the  features  do  not  contain  full  shape  information,  they  provide 

enough  to  serve  as  strong  clues  and  as  guides  during  the  matching 
process. 


ex 
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5.1 


In  order  to  avoid  thousands  of  ..orris  I  hasten  to  give  an  illustrative 

example.  Figure  5.1  shows  a  scene  and  some  of  the  features  we  may  find 
in  it  (heav i er  I i nes) . 

I  shall  also  give  an  example  of  a  prototype  and  the  features  it 
contains.  That  presentation  will  be  followed  by  more  precise 
definitions.  Figure  5.2  shows  a  2D  projection  of  a  parallelepiped.  Ue 
see  that  the  junction  of  any  two  faces  (as  given  in  the  projection  by 
their  common  line)  and  the  corner  junctions  at  the  ends  of  their  common 
edge  (as  given  by  the  end-vertices  of  the  line)  presents  one  of  only 
three  rotationally  distinct  line  and  end-vertex  constellations  (in  the 
plane),  namely  as  given  by  LI.  L2.  and  L3. 


Those  line  constellations  are  examples  of  the  basic  feature,  called 
"line-feature"  (abbreviation:  LF) ,  and  there  are  three  instances  of  each 
one  of  them  in  the  figure.  It  will  become  clear  later  why  LI  and  L2  are 
essentially  different.  The  LF: s  are  directional  (this  will  be  clarified 
shortly) ,  as  indicated  by  the  arrows. 

There  is  one  more  level  in  the  feature  hierarchy,  namely  the  "compound" 
(composite/complex/combined)  feature  (abbreviation:  CF),  which  is  simply 
an  aggregate  description  of  two  connected  LF:s,  each  of  which  is  a  ray 
of  the  other.  Figure  5.3  demonstrates  this. 

That  description  shows  how  they  are  connected,  and  also  gives  additional 
joint  information  about  opposing  rays  extending  from  the  extreme  ends. 

The  CF  is  a  strong  discriminator,  which  may  be  seen  in  the  prototype 


C1-L2+L1 


C4-L3+L3 


C5-L3  +  L2 


Prototype  PAREP  and  its  features 
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example  (Figure  5.2)  from  the  fact  that  there  is  one  CF  (C4  in  the 
figure)  that  contains  references  to  all  but  two  of  the  lines  in  the 
projection  of  the  parallelepiped.  The  CFss  are  intended  for  use 
primarily  as  initializations  (keys)  in  the  matching  (mapping)  process. 

Figure  5.2  shows  all  of  the  five  different  compound  features  of  the 
PAREP  prototype.  C3  and  C5  are  essentially  different  for  reasons  given 
below. 

Hoping  that  these  examples  have  provided  some  of  the  flavour  of  the 
feature  concept,  we  now  proceed  to  more  formal  definitions. 


5.2  FEATURE  DEFINITIONS 

The  features  conform  to  the  following  definitions: 

LF0.  A  line-feature  (LF)  is  an  encoded  description  of  certain 
basic,  projectively  invariant  characteristics  (in  2D)  of 
the  combined  junctions  of  the  side-region9  of  two  simply 
connected  vertices  (representing  corners  in  3D),  i.e.  of 
a  line  and  its  end-vertices. 

CF0.  A  compound  feature  (CF)  is  an  encoded  description  of  the 
same  properties  for  three  chain-wise  simply  connected 
vertices,  in  terms  of  the  LF:s  of  the  two  connected 
lines,  and  additional  information. 
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The  concept 
"L i ne-featur 
with  "I  me" 


of  projective  invariance  is  explained  in  Subsection  5.6. 

e"  iS  S0WetiWeS*  and  SOmewhat  ‘oosely.  used  interchangeably 
when  the  meaning  is  clear. 


°“  ,ha'  CF0  ,he  in  a  triangle,  in  uhich 

ue  general, g  get  three  basically  different  CFts.  Ue  nag  also  ge, 

onlg  ene.  bu,  never  tuo  in  the  case  0,  a  trihedral,  i  nteres,  ing,  g 

enough.  A  proof  of  this  is  given  later  (in  Suosection  S.S),  no, 

because  the  result  is  of  ang  use  bu,  because  the  case  is  o,  vaiue  as  an 
illustration  of  the  concepts  presented  here. 


Ue  proceed  non  ,o  a  description  of  the  encoded  information  (illustrated 
bg  exa»p,e,l.  uhich  Mil,  be  folloued  bg  discussions  o,  see  of  the 
Properties  of  the  LF  and  CF.  Both  Kinds  o,  feature  are  coded  into  36 

°"S  ‘°ne  ,J°rd’  0f  S,°^  ™  large, g  handied  bg  the  same 

rout i nes. 


The  line-feature  consists  of  the  following  items 
the  I ine  (18  bi ts) : 


for  each  direct  ion  of 


LF1.  LF  -  CF  di  scr  iniiriator  (flag). 

LF3.  Nunber  of  rags  forming  an  angle  >180  (measured  ecu.  from 
the  ray)  with  this  end  of  the  parent  line. 

LF3.  Any  of  those  rays  approximately  =180? 
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LF4.  Number  of  rays  forming  an  angle  of  <188, 


LF5.  Any  of  those  rays  approximately  -180? 

LFG.  Outside  angle  (<  <  *  >  180’),  measured  from  the  last  ray 
in  item  LF2  to  the  first  ray  in  LF4,  either  of  which  may 
be  the  base  line  itself.  This  item  shows  the  convexity 
of  the  vertex. 

LF7.  Constellation  of  the  two  opposing  rays  on  the  right-hand 
side  of  the  parent  line,  traversed  from  the  present  end 
to  the  other.  Viz.,  are  they  converging  to  this  side,  or 
diverging?  Could  they  be  parallel  (allowing  for 
perspective)? 

Item  LF7  is  of  special  importance  for  the  prediction  aspects  of  the 

-napping  program,  as  the  seouel  will  show.  Figure  5.4  provides  examples 
of  LF: s  and  their  encodings. 

The  compound  feature  contains,  for  each  direction  of  traversal  of  the 
I ine-pair  (18  bits): 

CF1.  CF  -  LF  discriminator  (flag). 

CF2.  LF  identifier  for  first  line-feature  in  this  direction 
(refers  to  a  central  list  of 


encountered  LF: s) . 


LF-i  terns  separated  by 


»»  r» 


LF-l  -/l  80-1 +jfo80-<l 80-r onv&pa  r 
(O-l -0-1 -0-0-1 ) 


Opposing  rays 
Base-1 ine 


Opposing  rays 


( 0-1 -O-l-O-l — 1 ) 

► 


Rav 


LF-l -/l 80-1 -/l 80- >1 80-conv&par 


( 0-1  -0-2-0-0-Q ) 


Figure  5,4 

The  line-feature  and  its  encoding 
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CF3.  Direction  in  which  that  LF  is  "traversed",  going  toward 
the  junct ion  of  the  pair. 

CF4.  Position  of  the  other  parent  line,  ccw.  around  center 

vertex,  relative  to  this  parent  line.  I.e.  (1  +  number - 
of-rays- in-between) . 

CF5.  Constellation  of  opposing  end-rays,  similarly  to  the 
corresponding  LF-item,  but  with  additional  bits  for 
col  I  inear i ty,  and  for  the  direction  in  which  these  rays 
(would)  intersect  (out  from  -  or  toward  -  the  CF) . 

Examples  of  CF:s  and  their  encodings  are  provided  in  Figure  5.5. 

Both  kinds  of  feature  are  subject  to  an  internal  ordering,  so  that  if 
the  two  halfwords  (each  describing  one  direction  of  traversal)  are  not 
similar  (the  intuitive  meaning  is  close  to  the  formal  definition),  they 
are  orderec  with  the  least  halfword  first.  Similarity  ui  I  I  be  defined 
immediately  following  this.  Host  LF:s  are  ordered  (directional),  and  we 

shall  see  that  all  CF:s  are  directional.  Subsection  5.4  treats  such 
matters  in  detail. 
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5.3  THE  FEATURE  SIMILARITY  RELATION 

The  basic  idea  behind  the  similarity  concept  is  that  we  want  to  be  sure 
that  two  similar  features  are  projectively  equivalent,  in  terms  of  the 
junctions  of  their  affected  side-regions.  The  important  things  here  are 
the  number  of  side-regions,  their  constellations  at  parent-line  end- 
vertices,  and  also  the  angular  convexities. 

Information  regarding  shape  of  regions  is  of  secondary  importance  and  ue 
allow  some  laxity  here,  as  indicated  by  the  definitional  exceptions  for 
para  I  lei i t ies  and  co I  I i near i t i es  below.  This  is  also  inherent  in  the 
feature  implementation,  since  there  are  no  references  to  secondary  ray 
constellations.  The  matching  program  is  of  course  much  more  rigorous  in 
such  matters  (Section  9). 

Definitions  of  feature  similarity: 

LFS.  Two  line-features  are  said  to  be  similar  (loosely  "equal" 
or  "the  same")  if  and  only  if  all  the  items  in  the  LF- 
definitions  are  identical,  with  the  exceptions  that 
=180"-i terns  are  ignored,  and  convergence-divergence 
indicators  are  ignored  in  cases  of  parallelism  (for 
opposing-ray  items). 
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CPS'  Two  compound  features  are  said  to  be  similar  if  and  only 
if  all  the  items  in  the  CF-def ini t ions  are  identical, 
except  that  convergence-divergence  indicators  (for 

opposing  rays)  are  ignored  in  cases  of  parallelism  or 
col  I  inear i tu. 

Figure  S.S  motivates  the  comparison  exceptions  in  para  I  It  I,,,  cases. 
•-180--p',ts  are  ignored  here,  simply  because  they  are  not  reliable. 

Practically,  i.e.  in  the  program,  comparisons  are  performed  through 
appropriate  masks,  using  logical  operations  and  shifts.  Thus  the 
feature  handling  is  very  efficient,  and  is  someuhat  in  the  nature  of 
"harduare".  From  similarity  tests  ue  get  information  about  relative 
magnitude  o.  tested  features,  in  the  case  uhere  they  are  no.  simiiar. 
This  is  used  as  a  basis  for  the  internal  ordering  of  features,  as  ue 1 1 
as  for  the  ordering  of  central  feature  reference  storage. 


Figure  5.7 
shapes  for 
s  i  m i I ar. 


"lustrates  the  situation  of  similar  features  but  different 
non-tr i hedral  objects.  The  features  in  that  figure  are  all 


In  the  case  of  trihedrals  ue  preserve  shape  relations  to  a  greater 
extent,  see  Figure  5.8.  The  cases  uhere  ue  do  no,  have  full  convergence 
information  for  combinations  of  rays  is  uhere  ue  have  3  or  4  rays 
extending  from  the  same  side  of  the  parent  line. 
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C  9 
D  •  O 


We  want  the  object  to  match  the  prototype. 

Thun  IP H  and  LFI2  should  be  judged  airailar. 

Lscl  and  Lac2  are  diverging,  but  approximately  parallel. 
Lpl  and  Lp2  are  converging,  and  approximately  parallel. 


Figure  5.G 
Feature  s i mi lar i t ies 
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Let  me  point  out,  once  more,  that  we  do  not  rely  on  the 
(partial)  shape  information  about  regions. 


features 


for 


in  me  context 

-  xo  s i m 1 1 ar 
m°re  PreCiSe'U  •»  "partial  ly  ei«Mar  I ine-featuree", 
“mCh  15  USeJ  ,or  P'-^aiction  purposes  uithln  that  process. 


5.4  NON-D1RECTJONAL  FEATURES 


Almost  all  common  I y  encountered  Inon-aegenerata)  features  are 
fractional,  i.e.  the  half-uords  are  not  similar.  Another  term  ,or  this 
ts  "ordered".  Ue  shall  see  nou  uha,  similarity  of  the  halfuords  uould 
'.ply  about  the  line-constellations,  intuitively  ue  uould  expect 
symmetries,  and  tha,  is  basically  „e  At  least  ,op  ,h,  Lp_ 

Shall  shou  that  there  is  no  such  thing  as  an  unordered  CF. 

In  the  case  o,  the  non-direct iona I  uF  the  tuo  halves  are  es.entially  ,he 
same,  traversed  i n  opposi te  direct  ions,  so  that,  topological l„  a,  least, 
me  get  a  rotational  symmetry  around  the  center  of  the  parent  line. 

Figure  5.9,  part  (a),  demonstrates  -  through  stepuise  build-up  -  the 
fact  that  in  order  to  be  non-direct ional .  a  convex  trihedral  line- 
feature  must  have  exactly  tuo  rays  at  each  vertex  and  on  each  side  o, 
the  parent  line,  uith  ray  convergence  and  outside  angles  in  agreement. 

,he  f'sure  alS°  Indicates  some  of  the  reasons  for  the  various  steps. 
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The  same  figure,  par,  (b).  snous  a  „o„-,rihedra,  exa„p|e  p)  ,he  genera| 
topologically  rotational ly  symmetric  case. 


He  now  prove  the  fol I 


owing  interesting  property  for  compound  features: 


Theorem  1: 


All  compound  features  are  directional  (internail 


y  or  Jered) . 


This  theorem  will  be  a  direct 


consequence  o,  the  fol louing  assertion, 


NOCF.  If  a  compound  feature  is  to  be  non-directional,  the 
two  parent  lines  muse  bo  col  I  inear. 

Proof  of  assertion  NOCF: 


Let  i's  assume  that  the  parent  li 
He  then  have: 


nes  are  not  col  I  inear  (see  Figure  5.10), 


(1)  The  LF:S  must  be  similar  (this  is  obvious  from  CF2). 

(2)  The  LF:s  must  be  non-directional. 


Figure  5.10  (a)  shows  the  parent  I 
contradiction  to  (2),  that  the  LF: 


ordering  relative  to  the  center  vertex 


ine-pair.  Let  us  assume,  in 
s  are  directional.  Their  internal 


must  then  be  the  same  (from  CF- 


def.  item  CF3).  This  is  indicated  by  the  arrows  in  (b)  of  the  fig. 


The 
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lines  L3,  L4,  L5,  ...  are  needed  one  by  one  according  to  the  following 
argument. 

The  need  for  the  existence  of  line  L3  arises  from  the  fact  that  the 
center  vertex  represents  the  same  constellation  for  both  LF:s.  However 
now  we  must  have  line  L4,  based  on  rays-in-between-parent-l  ines  (CF4). 
After  this  we  need  L5,  by  the  previous  argument.  Etc,  etc.  Clearly 
this  process  never  ends,  whereby  we  infer  a  contradiction. 

Thus  (2)  has  been  proven. 

Part  (c)  in  the  same  figure  shows  the  case  where  the  LF:s  are  unordered, 
Lines  L3  and  L4  are  necessary  because  constellations  for  LI  and  L2 
(resp.)  must  be  rotational  ly  symmetric.  Next  we  find  that  the  LF:s  are 
no  longer  similar,  which  we  try  to  remedy  by  inserting  L5,  LG,  tmd  L7. 

From  then  on  the  argument  is  brought  back  to  case  (b)  and  the  center 
vertex. 

He  arrive  at  a  contradiction,  wnich  proves  NDCF. 

All  we  need  to  do  now  to  prove  the  theorem  is  to  remember  that  the  LF- 
definition  groups  col  I  inear  rays  on  the  "<180"-8irie  (LF4) ,  and  that  we 
are  not  making  a  special  case  of  the  co  I  I  inear i ty. 

Lining  up  two  copies  of  the  final  feature  in  Figure  5.9  (a)  gives  an 
idea  of  what  an  unordered  CF  might  have  looked  like,  had  it  existed. 

The  subject  of  degenerate  views  is  treated  in  the  prototype  context, 
Subsect i on  S.8. 
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5.5  SOME  RESULTING  FEATURE  IDIOSYNCRASIES 

It  »ag  be  seen  nou.  the,  the  LF-uncodings  for  the  ,uo  constellations  LI 
and  L2  in  Figure  S.2  are  co.pletelg  different,  being  ordered  in  opposite 
directions  to  star,  uith,  as  Figure  5.11  demonstrates.  This  is  verg 
uell  -  theg  should  be  -  since  the  tuo  line-constellations  are 
essentiallg  different.  There  is  ro  nag  in  uhich  one  of  then  can  be  made 
to  cover  the  other  bg  a  rotation-translation  (in  tuo  dimensions),  a 
basic  inherent  principle  of  the  matching  process. 


Taking  the  line-constellations  by  themselves,  as  conglomerates  of  lines 
in  space,  we  may  achieve  a  match  by  also  rotating  one  of  them  in  a  plane 
at  an  angle  to  that  of  the  page.  This  would  correspond  to  looking  at 
the  back  of  the  object.  In  the  general  case,  of  course,  we  know  nothing 
about  the  back  of  an  object,  and  can  make  no  assumptions  regarding  its 
features.  If  the  back  differs,  a  different  2D  prototype  is  created  for 
that  view.  The  same  line  of  reasoning  applies  to  the  CF:s,  Cl  and  C2. 

He  have  noted  that  CF:s  are  directional,  and  that  the  number  of  rays 
from  one  parent  line  to  the  next.  ccw.  around  the  center  vertex,  is  one 
of  the  distinguishing  items  of  information.  However,  the  convexity  of 
the  angle  between  the  parent  lines  is  not  (for  any  direction  of 
traversal),  and  we  now  proceed  to  show  the  reason  for  omitting  it. 
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Theorem  2: 

Two  similar  (Subsection  5.3)  compound  features  have  the  same  annular 

convexity  of  parent  lines  at  the  center  vertex  in  the  directions  of 
traversa I . 

Proof: 

Assume  that  all  partaking  features  are  directional.  Theorem  1 
(Subsection  5.4)  showed  that  the  CF:s  are.  The  case  where  the  LF:  s  are 
not  is  handled  analoguouslu  to  that  case  in  the  proof  of  Theorem  1. 

Figure  5.12  illustrates  the  steps  below,  with  parts  I  and  II  showing  the 
actions  in  parallel.  Assume  that  the  CF : s  consist  of  the  LF:s  LF  1 1  and 
LFI2  (those  may  or  may  not  be  similar).  Since  LFIl  is  directional,  and 
we  know  that  the  LF  direct  ion  bi ts  (CF3)  in  the  CF:s  must  be  equal,  the 
two  instances  of  LFIl  are  both  pointed  the  same  way  with  reference  to 
the  center  vertices,  as  indicated  by  the  arrows  in  the  figure.  But  then 
items  LF2  and  LF4  in  the  line-feature  definition  (Subsection  5.2) 
necessitate  the  presence  of  the  lines  LI  and  L2,  respectively,  as  shown 
in  the  second  step,  (b)  in  the  figure. 

Now  item  CF4  (orbital  distance)  needs  line  L3  ir,  order  for  CF 1 1  and  CF  1 2 
to  be  similar.  Then  LF2  and  LF4  demand  L4,  etc.  We  reach  a  state  of 
contradiction,  since  we  can  never  satisfy  the  CF  definition  and  the  LF 
definition  simultaneously. 
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:  LF-O-jft  80- 1-^180- >180-, .  .ftpar 


LF-2-/1 80-0-^1 80->l 80-d i v 


V 


LF-i  tenia  separated  by 


LF-0-/1 80-2-/l  80->l 80- ,  .  ,&par 
LF-1-^1 80-0-/1 80->l 80-d  i  v 


Figure  5.11 
Two  non-simi lar  LF : s 
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This  concludes  the  proof. 


5.  S 


As  a  further  illustration  of  feature  idiosyncrasies  we  now  prove  that, 
in  the  convex  trihedral  case,  for  three  lines  forming  a  triangle  (within 
a  structure  of  other  lines),  we  get  either  three  similar  CF:s  or  three 

mutually  different  ones,  never  two  similar,  with  the  third  non-similar 
to  those. 

Proof: 

(Refer  to  Figure  5.13).  Part  (a)  illustrates  the  case  of  one  LF  and  one 
CF  only.  Now  (part  (b),  rays  omitted)  assume  that  the  CF: s  L1SL3  and 
L2SL3  are  similar  (other  cases  are  treated  analoguously) ,  with  two 

different  LF:s  present.  Then,  by  Theorem  2,  we  get  a  contradiction  on 
angular  convexity. 

It  follows  that  the  LF:s  must  all  be  the  same,  and  that  the  CF:s  are 
traversed  L2«SL3,  L2SL1,  as  shown  in  part  (c)  of  the  figure.  In  order  to 
have  L1&L2  non-similar  to  the  others,  we  must  add  at  least  one  line 
somewhere,  say  L4.  Part  id)  exemplifies  this,  for  a  direction  of  LF  1 1 
of  LI.  But  if  tne  line-features  are  all  similar,  we  must  insert  L5  and 
LB  (and  then  more),  as  shown  in  (e).  However,  this  would  be  impossible 
in  the  case  of  a  convex  trihedral,  unless  the  extra  lines  are  positioned 
symmetrically.  But  in  that  case  the  CF:s  would  all  be  similar  again 
(part  ( f ) ) .  The  case  where  LFIl  is  non-d i rect iona I  is  treated 
ana  I oguous I y. 

This  concludes  the  proof. 
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Ficjure  5.14  shows  uhy  the  assertion  does  not  hold  for  non-tr  i  hedra  I  3. 
The  Imes  LI.  L2.  L3  all  have  identical  LF:s.  The  CF:s  L1SL2  and  L1&L3 
are  similar,  whereas  L25L2  is  in  a  class  by  itself.  Tne  discriminating 
item  here  is  CF5  in  the  feature  definition 

Ue  now  proceed  to  a  discussion  of  projective  invariance. 


5.6  PROJECTIVE  INVARIANCE 


This  is  a  basic  idea  behind  depending  on  one  single  2D  prototype  for 
each  essentially  different  view  of  an  object  (cf.  Section  12),  and 
behind  the  construction  of  the  features  (and  prototypes). 

Definition: 

Let  C2  stand  for  the  total  class  of  two-dimensional  perspective 
projections  in  which  the  same  given  faces  of  an  object  are  visible. 

We  shall  show  that  both  LF:s  and  CF!S  are  projectively  invariant  over 
C2,  given  certain  rather  liberal  constraints. 
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Referring  back  to  the  feature  definitions  (Subsection  S.2),  it  is  easily 
seen  that  the  following  LF- items  air  'invariant  over  C2: 

PI1.  LF2  and  LF4  (constrained  to  trihedrals). 

PI2.  LF2+LF4  (not  constrained  to  trihedrals). 

PI3.  LF3  and  LFS  (in  the  case  of  strict  equality,  and  then 
constrained  to  trihedrals).  These  bits  are  ignored  for 
present  purposes  (used  for  degenerate  views,  see  next 
sec  t i on) . 

PI4.  LF7  (under  reasonable  projective  constraints,  see  below). 

Thus  the  complete  LF  is  projectiveiy  invariant  over  C2,  with  the 
constraints  above.  It  follows  that  (with  the  same  reservations)  the 
following  CF- i terns  are  invariants 

PIS.  CF2  and  CF3. 

PIG.  CF4. 

PI  7.  CF5. 

That  is.  tne  complete  CF.  as  well,  is  invariant  under  the  same 
cond i t i ons. 
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Proofs  of  essential  points  above: 

PP11.  Othermse  a  ray  would  hove  to  shift  over  the  extension 
of  the  parent  line,  which  means  that  a  previously  seen 
face  of  the  object  would  disappear.  This  is  not 
necessarily  true  for  non-tr i hedra I s,  as  shown  in  Figure 
5.15,  where  the  top  and  bottom  views  require  different 
mode  I s. 

PPI2.  By  the  same  argument  (also  in  the  general  convex  case). 

The  reasonable  constraints  for  PI 4  are: 

If  the  rays  are  parallel  in  space,  the  projections  should  still  be 
within  a  liberal  tolerance  of  being  parallel.  This  is  true  where  we  are 
not  too  close  to  the  degenerate  case. 

Otherwise  the  two  rays,  or  their  extensions,  intersect  somewhere  (only 
the  case  where  there  really  are  two  rays  on  the  same  side  of  the  parent 
line  is  practically  relevant),  and  we  stipulate  that  the  triangle  with 
the  base  line  as  one  side  and  the  intersection  of  the  rays  as  the 

opposing  vertex,  not  extend  beyond  the  plane  of  the  observer  (lens 
plane) . 

The  latter  case  is  illustrated  in  Figure  5.1G.  The  shaded  area  shows 
(in  2D)  where  the  observer  must  be  situated  in  order  for  the  projection 
to  stay  in  C2.  The  object  might  be  a  truncated  wedge,  seen  (by  the 
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observer,  not  the  reader)  from  sor/eurere  acove  and  beyond.  We  note  that 
for  observer  01.  the  line  Li  seems  longer  than  L2.  and  thus  that  the 
connecting  lines  mould  seem  to  converge  away  from  this  observer.  For 
observer  02.  however,  LI  seems  shorter  than  L2,  and  the  connecting  lines 
seem  to  diverge,  as  they  should,  and  as  the  two-dimensional  pnototype 
should  indicate. 

The  condition  fon  PK  is  always  sufficient  in  onden  to  pnesenve  the 
convergence  -  divergence  properties  as  expressed  in  the  projection  of 
the  triangle.  If  we  get  closer  to  the  object,  divergence  may  degenerate 
into  convergence  (or  vice  versa),  as  we  nave  seen.  In  the  practical 
case  this  condition  should  very  seldom  be  violated,  and  if  so,  then  for 
"border-line"  objects  with  edges  deviating  only  narrowly  from  being 
para  I  1 e I , 


5.7  SPECIFICITY  AND  FEATURES 

It  may  be  of  some  interest  to  dwell  shortly  on  the  question  of  how  much 
or  how  little  information  we  would  want  a  feature  to  contain,  leaving 
aside  the  considerations  of  convenience  in  storage  and  handling,  etc. 
The  contention,  naturally,  is  that  the  LF  and  CF  contain  the  proper 
amounts,  apart  from  being  obviously  convenient  to  handle. 

Looking  at  the  LF,  there  really  isn’t  much  more  information  around, 
given  that  le  want  to  preserve  the  projective  invariance.  Bu+  two 
additional  items  might  be  included,  namely: 
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LFA1.  Connectivi  ty  of  outgoing  rays  (ex.  triangle). 

LFA-.  Convergence-para  I  lei  ism-divergence  for  all  combinations 
of  rays  on  the  same  side  of  the  base-line  (not  just  the 
two  opposing  rays). 

The  reasons  we  do  not  want  LFA1  are.  first,  that  (in  the  scene 
representation)  the  connectivity  of  the  rays  may  be  obscured  by  other 
objects,  in  which  case  the  features  would  not  match  -  and.  secondly, 
that  there  really  is  no  need  for  it  in  the  .etching  program,  since  that 
process  uses  the  connectivity  of  the  prototype  as  a  template. 

LFA2  might  be  more  useful,  mostly  in  the  case  of  non-tr  ihedra  I  s.  For 
example,  in  Figure  5.17  the  I  ire-feature  of  the  bottom  line  is  the  same 
for  all  three  objects.  Implementing  LFA2  would  be  slightly  painful, 
since  we  get  a  combinatorial  amount  both  of  storage  and  of  handling. 

The  second  reason  againa,  LFA1  is  still  a  good  one  here,  complemented  by 
the  fact  that  the  prototype  acquisition  program  generalizes  on 

parallelit.es.  and  that  such  information  is  used  in  the  mapping  process, 
as  we  shal I  see. 

lie  would  not  want  less,  on  the  other  hand,  since  all  the  information  is 
necessary  (to  the  present  system)  in  order  to  provide  power  of 

discrimination  and  prediction  (Section  G  -  prototypes,  and  Section  9  - 
matching) . 

Similar  arguments  hold  true  in  the  case  of  the  CF.  in  Section  6  we 
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5.7 

shall  deal  uith  some  additional  aspects  of  features,  such  as  their 
uniqueness  properties  as  keys  into  prototypes. 


4 


6.0  PROTOTYPES 


6.1  GENERALITIES 


The  handling  of  prototypes,  like  features,  is  fully  generalized  and 
automatic.  This  is  true  for  acquisition  as  well  as  for  their  use  in  the 
matching  process.  The  prototypes  are  perspective^  consistent  two- 
dimensional  representations  of  views  of  objects  in  space.  All  objects 
are  assumed  planar-faced  and  convex. 

He  are  not  imposing  a  restriction  to  trihedral  objects,  but  additional 
prototypes  may  be  required  here,  as  we  have  seen  in  Subsection  5.8. 

Note  that  the  restriction  to  convex  objects  has  nothing  to  do  uith  the 
basic  structures  of  the  features  and  prototypes.  Those  are  quite 
general  and  would  handle  concave  objects  as  well.  No.  the  reason  for 
this  restriction  is  simply  that  concave  objects  give  rise  to  an 
abundance  of  weird  views  (self-occlusions,  vertex  coincidences,  edge 
alignments,  ...  you  name  it),  each  of  which  would  require  its  own  2D 
prototype.  They  would  also  introduce  difficulties  in  the  foru  of 
partial  matches. 

Those  circumstances  mould  rapidly  make  the  parsing  strategy  unworkable. 

due  to  overwhelming  combinatorics.  Experiments  with  an  L-beam  (two 

equally,  wide  parallelepipeds  "glued"  together  into  an  L>  have  borne  this 
out. 
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An  extension  to  concave  objects  may  be  based  on  regarding  such  objects 

as  composed  o.  several  convex  parts  (cf.  [Roberts)  in  Subsection  3.1 
and  Sect  ion  12) . 

The  models  are  based  on  extended  topoloyica,  equivalence  [including 
convergence-  and  paral  le I  , sm-propert I es)  and  are  therefore  very  general 
Thus  one  single  prototype  is  used  to  represent  al,  non-deyenerate  vieus 

of  parallelepipeds,  including  skeued  ones.  The  final  object 

classification,  in  an  extended  scheme,  mould  take  place  in  a  context  of 
three-dimensional  models. 


I)  mould  have  been  possible  to  use  30  models  exclusively.  ui,h  a 
project  ion-yenerating.  feature-extracting  program  uorking  directly  on 
those,  and  using  back-projections  for  purposes  of  matching.  This 
possibility  has  been  tested  theoretically  and,  uhile  conceptually 
appealing,  mould  see.  to  introduce  additional  difficulties  concerning 

generality  and  sensitivity  to  error.  A  discussion  of  related  subjects 
can  be  found  in  Section  12. 


6.2  INTERNAL  REPRESENTATION 


The  internal  st 
I ines,  and  that 
pointed  into  by 


ructure  of  a  prototype  is  based  mainly  on  its  constituent 
information  (belou)  is  stored  in  a  shared  structure, 
every  prototype,  since  the  number  of  lines  varies 


between  model s. 
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The  following  basic  items  are  stored  for  each  model: 

PI.  Name  (string). 

P2.  Number  of  vertices. 

P3.  Number  of  lines. 

P4.  Pointer  into  I i ne-re ference  storage. 

The  fol  lowing  is  «h,  inf™ ,i0„  lor  aach  line  in  ,  pro,otyp>  (3  ^ 
Of  storage): 

PL1.  End-vertices. 

PL2.  Pointers  to  next  lines  (ecu.)  at  end-vertices. 

PL3.  Uhich  side  (if  any)  is  part  of  the  object  contour. 

PL4.  Line-feature  equivalence  class  (explained  below). 

PL5.  L,ne  parallelity  class  and  length  class  (also  explained 
below) . 

PLS.  Line- feature  identifier,  and  the  LF  itself  for  easy 
reference. 
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The  lines  are  ordered  (directed)  the  same  as  their  line-features,  and 
both  lines  and  vertices  are  assigned  an  internal  labelling.  This  makes 
PH  through  PL3  meaningful,  and  makes  it  possible  to  reference  every 
e I ement  of  a  mode  I . 


6.3  LINE-FEATURE  EQUIVALENCE  CLASSES 


Let  us  for  a  moment  contemplate  the  somehou  familiar  object  in  Figure 
6.1.  Let  us  assume  that,  somewhere  in  the  scene,  we  have  found  the  CF 
depicted  at  (b)  in  the  same  figure.  The  natural  thing  to  say  is:  "Aha, 
ii  fits  precisely  on  LP1  and  LP3  in  the  model..".  This  is  true,  but  the 
CF  fits  equally  well  on  LP3  and  LP2,  or  on  LP2  and  LP1.  These  are 
distinct  lines  in  the  internal  representat i on  of  the  prototype,  as  was 
noted  above. 

Looking  at  the  figure,  however,  one  realizes  that  all  of  those  three 
initial  mappings  are  equivalent,  in  the  sense  that  the  topology  context 
(including  parallelisms  and  convergence  properties)  is  the  same  for  each 
one  of  the  three  I ine-pairs.  This  can  be  clearly  seen  by  turning  the 
page  around,  and  using  in  turn  LP4,  LP5,  and  LP6  as  the  bottom  line  of 
the  object.  Note  that  the  different  parallelepipeds  in  Figure  6.2 
essentially  (but  for  proportions)  differ  only  in  the  angle  of  viewing. 
They  are  in  the  same  C2  (Subsection  5.6). 

This  leads  us  to  the  conclusion  that,  having  investigated  the  mapping 


82 


K 

I 


m 


Lli.L<:«LPl<SLP3,  there  is  absolutely  no  sense  in  bothering  with 
L1SL2«LP2&LP1,  since  the  result  will  be  no  different  front  the  first  one. 


Ue  say  that  the  line-features  of  LP1,  LP2,  and  LP3  are  equivalent,  or  in 
the  same  equivalence  class.  This  is  true  also  for  the  aforementioned 
CF:s,  but  that  fact  is  not  explicitly  recorded  in  the  prototioe  itself 
(since  it  contains  no  CF  storage),  only  implicitly  in  the  central 


feature  reference  pointer  structure  (Subsection  G.5).  CF:s  are  not 


formally  assigned  equivalence  classes.  Of  course  the  equivalence  of 
CF:s  is  contingent  on  the  equivalence  of  their  LFss,  as  defined  below. 


Note  that  the  concept  of  equivalence  class  is  meaningful  only  in  the 
context  of  a  specific  prototype.  We  proceed  now  with  the  formal 


(recursive)  definition. 


DEFINITIONS: 


Two  lines  (line-features)  of  a  prototype  are  said  to  have  the  same 
equivalence  classification  if  and  only  if  the  line-features  are  similar, 
and  all  lines  attached  to  the  two  given  lines  (in  the  proper  cc-uise 
order  around  the  vertices,  and  in  the  direction  of  the  LF)  belong 


pairwise  to  the  same  equivalence-classes. 


Two  line-pairs  (compound  features)  are  said  to  be  in  the  same 
equivalence  class  if  and  only  if  their  respective  compound  feature  words 
are  similar  and  their  constituent  line-features,  taken  in  the  order  of 
the  internal  orderings  of  the  CF : s ,  belong  (pairwise)  to  the  same  LF- 


equi valence  classes. 


.aaajfc  - - -  - - -  -  j.,, 


85 


The  following  algorith.  i,  used  in  ,he  prototype  analyzer  to  de«er„i„e 
the  equivalence  classes  for  line-features. 

ALGORITHM; 

A.  Give  the  lines  an  initial  assignment  of  tentative  equivalence 
classes,  a  different  class  for  each  different  line-feature 

type,  so  that  the  initial  condition  ffeature  similarity)  is 
satisfied. 

B.  For  each  equivalence  class,  EQ,  the  first  encountered  line  is 
non  assumed  correctly  classified.  Go  down  the  list  of  other 
lines  belonging  (so  far)  to  EQ,  checking  whether  they  conform 
to  the  definition,  using  the  original  line  as  a  template.  If 
a  line  does  not  conform,  make  a  note  of  this,  but  do  not  at 
this  stage  change  the  classification. 

C.  If  there  are  no  changes  noted,  exit.  Otherwise  change  all 
marked  lines,  so  that  all  such  lines  of  a  given  EQ  are 
assigned  the  same,  new,  equisalence  class.  Iterate  from  B. 

Note  that  we  cannot  effect  changes  as  soon  as  the  need  is  seen,  since  we 
night  encounter  a  situation  where  such  action  would  partially  change 
so»e  EO  assignment,  thereby  obscuring  the  fact  that  some  pair  of 
differently  grouped  lines  should  really  have  the  sane  classification. 

As  a  natter  of  taste,  we  could  make  the  change  between  equivalence 

classes,  but  it  doesn't  make  much  difference,  and  the  present  algorithm 
is  convenient  for  programming  reasons. 


We  now  proceed  to  prove  the  correctness  of  the  algorithm,  and  also  that 
it  provides  the  minimum-spread  such  c I assi f i cat i on,  i.e.  that  two  lines 
are  classified  differently  if  and  only  if  they  do  not  belong  to  the  same 
EQ,  according  to  the  definition. 

PROOF  OF  ACCURACY  OF  THE  ALGOR] THU: 

From  the  fact  that  step  8  of  the  algorithm  analyzes  the  complete 
orototype,  testing  the  EQ-c I  ass  i  f  icat  ions  for  all  lines  according  to  the 
recursive  definition,  it  follows  that  those  classifications  are  in 
accordance  with  loe  definition,  when  the  algorithm  is  exited.  Otherwise 
step  B  would  be  reiterated. 

On  the  other  hand,  all  lines  with  an  initial  assignment  to  an 
equivalence  class  ure  assigned  the  same  nu.w  classification  if  and  on'y 
if  they  do  not  conform  to  the  first  line  of  that  class.  Furthermore, 
the  changes  are  performed  at  the  same  time,  just  before  iteration,  and 
do  not  influence  the  conformity  tests  in  step  B.  Thus  it  is  impossible 
to  exit  with  two  lines  classified  differently,  unless  they  should  be. 
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G.4  PARALLEL  I  TV  AND  LENGTH  GENERALIZATIONS 

The  prototype  analyzer  generalizes  on  two  things  particularly  and 
explicitly  (besides  those  generalizations  inherent  in  the  features), 
viz.  paraMelity  and  length,  in  the  following  restricted  sense: 

Gl.  Two  lines  in  a  prototype  are  said  to  be  in  the  same 

paraMelity  class  if  and  only  if  the  smallest  difference 
between  their  angular  arguments  in  some  direction  is  less 
than  some  given  limit,  currently  5  degrees. 


G2.  Two  prototype  lines  are  s»id  to  belong  to  the  same  basic 
length  class  if  and  only  if  they  are  in  the  same 
paraMelity  class  (length  class  =  paraMelity  class). 
However,  we  allow  two  length-categories,  one  long  and  one 
short,  within  each  length  class.  Any  line  (within  some 
length  class)  will  be  assigned  to  the  longer  category  if 
and  only  if  it  is  longer  than  1.25  times  the  length  of  the 
shortest  line  in  that  length  class. 


The  chief  reason  for  the  use  of  paraMelity  classes  is  prediction,  where 
•Je  may  have  to  know  the  approximate  direction  of  a  missing  lino  in  order 
to  insert  a  tentative  one,  or  the  direction  we  expect  a  new  line  to 
■ave,  in  order  to  be  able  to  discard  one  that  deviates  too  much.  This 
is  not  always  possible  on  the  basis  of  the  line-feature  data  alone  (the 
only  feature  used  throughout  the  mapping),  since  the  parallel  lines  may 

sometimes  not  be  simply  connected.  It  is  also  convenient  for  easy 
ref erenc i ng. 
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There  are  two  basic  reasons  for  the  introduction  of  length  classes.  The 
first  one  is  that  knowing  the  approximate  length  of  a  line,  we  may  be 
able  to  quickly  decide  whether  to  believe  in  it,  or  to  look  for  an 

extension,  or  if  it  seems  necessary  to  divide  the  line  and  use  only  part 
of  it. 

The  second  reason  is  that  it  gives  us  a  more  tangible  hold  on 
perspective,  since  perspective  deformations  have  less  effect  on  relative 
lengths  within  parallel) ty  classes  than  they  have  on  angles.  Figure 
G.3,  part  (a),  shows  this  clearly.  The  lines  L3  and  L4,  while  parallel 
in  space,  have  an  angular  difference  of  about  45  degrees,  whereas  the 
effect  on  the  relative  lengths  of  the  parallel  lines  LI  and  L2  is  much 
slighter  (somewhat  awkwardly  expressed),  LI  being  about  1/8  longer  than 
L2.  Thus  the  relative  lengths  of  LI  and  L2  would  not  refute  the 
assumption  that  L3  and  L4  are  parallel,  which  the  prototype  demands. 

Using  the  angle  alone,  we  would  have  to  set  the  discriminator  very 
liberally,  thereby  likely  introducing  erroneous  assumptions  elsewhere. 

The  truncated  wedge  in  Figure  6.3,  part  (b) ,  indicates  the  reasons  for 
introducing  length  sub-classes.  Ue  are  assuming  that  we  will  not  be 
dealing  with  objects  that  would  necessitate  more  than  two  such 
categor i es. 

Lie  shall  sometimes  talk  of  "equality-classes"  as  a  collective  term  for 
these  generalizations. 

The  concepts  above  (reasons  for  use  of  -)  will  become  clearer  further 
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on  (Section  9),  in  dealing  with  the  matching  from  scene  elements  to 
prototypes.  Ue  shall  now  briefly  return  to  the  feature  structure. 


6.5  CENTRAL  FEATURE  REFERENCE  STRUCTURE 

The  following  is  a  description  of  how  the  feature  table  is  built  up. 
with  reference  to  prototype  access.  Use  is  made  here  of  the  concept  of 
equivalence  class,  so  that  redundancies  are  avoided. 

Figure  6.4  shows  the  details  of  this  central  feature  reference  list 
structure.  This  storage  is  a  complete  ordered  array  of  all  features 
found  in  the  prototypes,  augmented  by  pointer  structures  for  references 
back  to  the  prototypes.  The  prototype  analyzer  ascertains  that  there  is 
exactly  one  reference  from  each  different  line-feature  to  each  model 
that  contains  that  specific  LF,  and  to  some  line  belonging  to  each 
equivalence  class  of  that  LF,  within  the  prototype. 

In  the  case  of  CF:s,  we  make  sure  there  is  exactly  one  pointer  to  each 
line  in  the  pair  of  the  CF,  with  similar  restrictions  to  avoid 
redundancy. 

The  reference  list  also  contains  pointers  to  all  CF: s  encountered  in  the 
scene  at  any  given  time  of  analysis.  Therefore  the  parsing  program 
simply  goes  down  the  lists,  exploring  feature  matches  in  order  of 
decreasing  feature  complexity,  essentially  investigating  all  initial 
mapping  possibi I i t ies. 


91 


tried? 


Feature  reference  word 


Complexity 
(number  of 
rays) 


Feature 
is  un¬ 
ordered? 


Pointer  to 
scene- 
aubstruct. 


Pointer  to 
prototype  “ 
substruct. 


End  at  Scene-  End  at 

center  line  1  center  |  line  2 


pointer 


Pointer  back 
to  prptptype 


Pointer  to 
prot.  inst. 
substruct. 


pointer 


Pointer  back  Pointer  back  I  Number  of  items 

to  prototype  to  feature  |  in  8Uhli8t 


First 
line-end 
at  inters, 


Prototype 

line 


Case  of  compound  feature 


|  nrst  I,*’  |  S»ni.nri 

prot.  ind 

line 


Second 
1 ine-end 
at  inters. 


Case  of  line— feature 


LF  equivalence 
class 


Second 
prot . 
line 


Central  feature  reference  storage 


G.5 


Ue  shall  return  to  these  subjects  in  the  context  of  the  parsing  process 

Section  S,  uhich  anong  other  things  describes  the  feature  extraction 
over  the  scene. 


G.G  PROTOTYPE  ACQUISITION 


As  has  been  pointed  out,  the  acquisition  (or  "learning")  of  a  neU 

prototype  is  fully  automated,  and  all  prototypes  are  treated  exactly  the 

same.  The  following  is  an  account  of  the  steps  in  the  input  of  a  net, 
prototype. 


IP1.  Input  perspectives  consistent  line-drawing. 

IP2.  Analyze  this  line-drawing,  using  the  pre-processing 
package. 

IP3.  Call  +he  prototype  analyzing  program. 

IP4.  Flush  the  line-drawing  and  associated  data-structures 
(created  in  IP2). 


6.6 

Main  actions  performed  by  the  prototype  analyzer  are: 

PA1.  Classify  constituent  lines  in  terms  of  line-features.  If 
heretofore  unknown  features  are  encountered  they  are 
added  to  the  central  feature  list. 

PA2.  Create  compacted  topological  data-structure  for  the 
mode  I . 

PA3.  Find  LF-equi valence  classes,  paral lei i ty  classes,  and 
length  categories. 

PA4.  Update  LF  pointers  in  the  central  reference  list,  so  that 
it  contains  one  reference  to  this  prototype  (and  a  line) 
for  each  combination  of  LF  and  LF-equi valence-class. 

PAS.  In  parallel  with  PA4  find  CF:s,  and  update  the  central 
feature  list  as  in  PA1,  and  also  update  pointer 
structures  similarly  to  PA4  (Subsection  6.5). 

The  following  are  some  comments  to  clarify  steps  above. 

In  IP1  the  line-drawing  is  given  by  providing  (from  the  console  or  via  a 
file)  the  end  coordinates  for  all  participating  lines.  Care  must  be 
taken  to  obtain  approximate  parallelisms  where  such  are  desired,  and  to 
avoid  them  where  unwanted.  By  "perspect i ve I y  consistent",  we  mean  that 
spatially  parallel  lines  should  be  adjusted  length-  and  angle-wise  by 
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so»e  Mall  amount  in  order  to  indicate  a  perspective  deformation,  since 

that  concept  is  used  in  the  matching  program  (Section  9) (and  only 
there). 

Sometimes  me  cannot  generate  on  such  perspective  deformations,  namely 
if  those  object  faces  (that  contain  the  parallel  lines)  form  an  exterior 
angle  of  less  that  270  degrees,  in  uhich  case  me  get  a  dependence  on 

orientation.  Figure  B.S  demonstrates  this  state  of  affairs,  in  the  case 
Of  a  skewed  parallelepiped. 

Such  line-drauings  could  be  generated  automatical  Ig  in  a  full-, lodged  3D 
system,  as  indicated  in  Section  12  (future  possibility). 

Step  IP2  entaiis  finding  the  vertex  connections  and  setting  up  the 

normal  cross-reference  data-structure.  Such  things  are  treated  later, 
in  Section  7, 


After  the  learning  of  a  prototype,  all  that  remains  ie  the  internal 
representation,  not  the  I  ine-drau  i  ng.  This  data-structure  (for  all 
current  models)  mag  then  be  convenientlg  saved  on  auxi Mary  storage.  He 
mag  thus  have  different  sets  o,  models,  uhich  can  be  used  easi  Ig  and  at 
mill.  One  meg  conceive  o,  a  future  system  that  makes  some  intelligent 
use  o,  such  different  sets  o,  pretotgpes,  trying  a  ne„  set  if  the 

current  one  seems  to  gield  unsati staotorg  results.  It  mould  to  some 
extent  be  able  to  accomodate  itself  to  the  surroundings.  Houever,  there 
is  no  use  for  such  a  scheme  in  the  present  sgstem,  but  possibly  in  a 
more  sophisticated  one,  uhere  ue  utilise  three-dimensional  models  and 
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01 


Skewed  parallelepiped  resting  on  table. 


Orders  of  apparent  lenBths  of  LI,  L2,  and  13  for  tbe 
observers  at  01,  02,  and  03,  respectively: 

01:  L1-L2-L3  02:  L2-L3-L1  03:  L3-L2-L1 

The  observers  are  all  thought  to  be  in  a  plane  parallel 
to  a  plane  through  the  center  points  of  LI,  1,2,  and  L3. 


Figure  6.5 

Orientation  dependent  perspective  deformation 
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have  access  to  depth  information  in  the  analysis  of  the  scene  (Section 

12). 


Another  possibility  is  to  have  the  program  learn  new  prototypes  by 
"consistent  encountering",  i.e.  by  finding  something  new  a  sufficient 
number  of  times  tc  conclude  that  it  probably  is  some  object  it  should 
know  about.  Such  a  scheme  is  nice  because  it  is  more  general,  but  it  is 
also  more  error-prone,  since  we  would  not  necessarily  encounter  perfect 
(enough)  instances  of  the  object  projections. 


In  an  extended  scheme  (3D)  the  prototypes  would  be  given  by  the  end- 
coordinates  of  their  edges,  and  the  acquisition  program  would  generate 
all  different  views  of  the  object  in  question,  creating  a  new  20  model 

whenever  the  current  projection  does  not  map  onto  any  of  the  existing  2D 
prototypes. 


The  next  two  sub-sections  deal  with  the  currently  used  set  of  models, 
and  will  provide  some  discussion  of  the  extent  to  which  objects  can  be 
conveniently  and  unambiguously  represented  through  the  prototype  and 
feature  schemes  given  here. 
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6.7  CURRENTLY  USED  PROTOTYPES 

In  this  subsection  we  refer  to  Figure  6.6,  which  provides  a  set  of  the 
most  useful  (and  realistic)  models.  The  most  often  used  prototypes  are 

given  in  the  following  table  (for  a  definition  of  "degenerate",  see 
Subsect  ion  6.8) . 


Ml. 

PAREP: 

Paral lelepiped  (non-degenerate) . 

M2. 

HEDGE: 

Hedge  (non-degenerate) . 

M3. 

DPAREP: 

Parallelepiped  (degenerate). 

M4. 

DHEOGE: 

Hedge  (degenerate). 

M5. 

THEOGE: 

Truncated  wedge  (auxiliary  model) 

In  other  words  we  have  four  2D  prototypes,  which  represent  all  possible 
views  of  our  two  different  objects  (not  counting  the  THEDGE) .  The 
choice  of  objects  was  based  on  their  simplicity  and  regularity.  Of 
course,  one  might  want  a  more  varied  set  of  models,  such  as  a 
tetrahedron,  truncated  objects,  etc.  The  truncated  wedge  has  been  used 
from  time  to  time,  experimentally.  It  is  not  currently  an  active 
prototype. 

The  fact  that  one  nf  the  models  (the  PAREP)  may  be  thought  of  as 
composed  of  two  instances  of  another  (the  HEDGE)  tests  the 
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Figure  6.6 

Current  and  auxiliary  prototypes 
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discriminatory  powers  of  the  system,  since  it  introduces  partial 

matches.  He  shall  get  back  to  that  topic  later,  in  Section  9  and 
Section  11. 


6.8  DEGENERATE  VIEWS 

As  we  have  seen  (Figure  6.6),  the  prototypes  contain  representations  of 
degenerate  views  as  well  as  "normal"  ones.  A  degenerate  view  is  here 
defined  as  one  in  which  there  is  no  vertex  where  more  than  two  side- 
regions  meet.  Usually  such  a  view  is  one  where,  for  that  orthogonal 
projection  which  shows  the  same  sides  of  the  object,  rotating  the  object 
a  small  angle  around  some  axis  would  change  the  topology  of  that 
orthogonal  projection.  Note  that  with  suitable  projective  constraints 
(Subsection  5.6)  ti.ere  is  always  such  an  orthogonal  projection. 

He  shall  use  the  tsrm  "perspective  I y  degenerate"  in  the  case  where  a 
similar  rotation  would  change  the  topology  of  the  perspective 

projection.  Ue  shall  sometimes  use  the  obvious  abbreviations  D-vieu  and 
PD-view. 

Thus  (a)  in  Figure  6.7  shows  a  degenerate  parallelepiped  and  wedge, 
whereas  (b)  represents  perspective^  degenerate  views  of  the  same 
objects.  Note  that  the  term  degenerate  is  used  somewhat  inconsistently 
with  its  usual  meaning  in  cases  like  the  wedge.  It  was  chosen  for 
conven i ence. 
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So,  in  the  present  system,  degenerate  views  are  represented  by 
degenerate  models.  However,  we  cannot  do  the  same  for  perspect i ve I y 
degenerate  views,  since  in  those  cases  (cf.  LI  and  L2  in  the  same 
figure)  we  do  not  often  find  the  initial  lines  representing  degenerate 
planes  unbroken.  On  the  contrary,  they  are  often  split  into  two  or  more 
parts  which  form  small  angles  with  one  another.  This  often  makes  it 
difficult  to  decide  whether  we  are  dealing  with  a  PD-view  or  not. 


As  an  added  attraction,  we  often  get  views  like  those  in  Figure  6.7, 
part  (b),  due  to  occlusions.  In  most  cases,  however,  such  an  occludi 
object  should  be  better  matched  to  some  prototype  and  thus  disappear, 
leaving  us  with  a  partial  mapping  (part  (b),  with  LI  and  L2  gone). 


ng 


Therefore  PO-views  is  one  of  the  problems  in  the  present  system,  as 
indeed  they  would  tend  to  be  (I  suspect)  in  any  vision  system  dealing 
with  the  real  world.  They  have  to  be  regarded  as  special  cases  of  D- 
views.  On  the  other  hand,  we  want  to  be  able  to  pick  up  the  marginally 
non-degenerate  cases,  as  indicated  in  part  (c)  of  the  same  figure. 


Uhat  makes  the  problem  hard  is  partly  that  a  very  slight  change  in  the 
data  may  result  in  a  dramatic  change  in  topology.  The  other  unfortunate 
circumstance  is  a  consistent  lack  of  helpful  edge-information  in  such 
areas,  due  to  their  narrowness.  This  makes  it  hard  to  verify  predicted 
line-elements.  A  slight  amount  of  ad  hoc  -ery  has  been  necessary  in 
order  to  detect  these  cases  and  channel  them  into  the  proper  prototypes. 
This  is  done  by  channeling  border-line  instances  (where  an  outer  angle 
of  an  LF  is  between  180-Alpha  and  180  degrees,  Alpha  currently  being  set 
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to  7.0)  into  the  degenerate  case,  rather  than  trying  to  complete  them  as 
regular,  almost  degenerate  objects.  The  reason  for  this  is  partly  very 
practical,  since  some  subroutines  for  intersections,  co I  I i near i t i es,  and 
the  like,  get  fouled  up  when  dealing  with  a  region  that  has  been 

squeezed  almost  into  a  line.  Nevertheless,  there  is  a  global  switch  to 
enable  this  scheme. 


6.9  REPRESENTATIONAL  AMBIGUITIES 

It  may  be  interesting  to  make  an  assessment  of  the  extent  to  which 
objects  can  be  adequately  and  unambiguously  represented  through  the 
features  and  prototypes  suggested  here.  That  is,  are  there  objects  for 
which  the  parsing  program,  or  rather  the  prototype  matching  program, 
might  mistake  one  object  for  another? 

Of  secondary  importance  is  the  uniqueness  of  the  initial  line-mappings 
provided  by  (primarily)  the  compound  feature  and  (secondarily)  the  line- 
feature.  The  reason  this  is  not  crucial  is  that  the  matching  program 
has  the  full  power  of  decision  and  will  give  low  marks  to  bad  mappings. 

Let  us  look  at  the  last  question  first  (see  Figure  G.S).  The  line- 
feature.  applied  on  LI  in  the  fig.,  will  put  all  three  objects  in  the 
same  class.  The  compound  feature,  applied  on  LI  and  L2,  will  be  able  to 
distinguish  between  (a)  and  (b)  but  not  between  (b)  and  (c) .  However, 
the  objects  with  which  we  are  dealing  are  usually  not  as  complicated  as 
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that.  The  following  table  (over  the  four  mostly  used  prototypes) 
demonstrates  the  performance  of  the  compound  feature  in  terms  of 
uniqueness  as  initiator  into  mappings: 


Total 

number 

of  CF: s 

30. 

CF:  s 

mapping 

into  1  prototype  only 

25. 

CF:  s 

mapping 

into  2  prototypes 

4. 

CF:  s 

mappi ng 

into  3  prototypes 

1. 

This  shows  that  for  the  most  commonly  useful  (uncomplicated)  objects, 
the  compound  feature  is  quite  an  accurate  guide  for  mapping 
initializations.  Of  course,  degree  of  uniqueness  is  directly 
proportional  to  complexity  (number  of  rays),  and  therefore  the  line- 
feature  is  much  less  suited  for  mapping  initializations. 

Now  to  the  main  question: 

How  similar  do  two  object  projections  have  to  be,  in  order  for  their 
prototype  -  feature  representations  to  be  potentially  subject  to 
confusion? 

Clearly,  to  start  with,  the  topologies  must  be  the  same.  Furthermore 
every  pair  of  corresponding  line-features  has  to  be  similar  between  the 
two  projections,  so  that,  at  every  vertex  and  on  both  sides  of  tne 
extended  base-line,  the  topologies  must  agree.  Angular  convexities  must 
also  agree,  for  all  line-junctions.  Paral I e I i t i es  and  relat i ve  lengths 
of  parallel  lines  must  agree  (within  tolerances),  not  only  as  given  by 
the  limited  reach  of  the  line-feature,  but  also  as  recorded  in  the 
paral lei i ty-  and  length-class  items,  which  reach  over  the  entire  model. 
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We  may  therefore  conclude  that  ambiguities  in  the  prototype 
representation  of  2D  projections  are  introduced  only  in  terms  defined  by 
our  tolerance  levels  for  parallelity  and  length-quotients. 

Figure  6.9  gives  an  example  of  Such  ambiguities.  Clearly,  the  models 
may  be  constructed  (and  analysed)  with  any  desired  levels  of  tolerance, 
but  the  crucial  issue  is  how  well  we  (the  program)  will  manage  to 
distinguish  between  them  in  the  parsing  process. 

Here  is  one  case  in  which  the  perspective  information  (as  indicated  in 
the  prototypes)  may  be  helpful.  Thus  in  Figure  6.9  we  may  be  able  to 
distinguish  between  (a)  and  (b),  or  (b)  and  (c) .  However,  (c)  might  be 
a  perspective^  deformed  version  of  (a),  or  it  might  be  another  model  (a 
truncated  wedge,  on  its  head,  for  instance). 

This  is  a  case  that  should  not  be  likely  to  arise  in  practice,  and  where 

(if  it  did)  we  would  be  compelled  to  rely  on  3D  knowledge  for  the 
dec i s i on. 

This  concludes  the  sections  on  features  and  prototypes,  and  before 
continuing  with  the  related  topics  of  mapping  and  parsing  I  shall  now 

briefly  describe  the  nature  of  initial  data  and  necessary  preprocessing 
stages. 
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7.0  PREPROCESSING 


7.1  INITIAL  DATA 


The  initial  input  consists  of  an  array  of  two-dimensional  coordinates 
for  locations  of  intensity  discontinuities  in  the  TV-i»age.  Those 
points  are  called  edges  throughout  this  paper.  That  total  input  is 
hereafter  called  the  edge-drauing.  Figure  7.1  demonstrates  the 
character  of  such  initial  edge-drawings. 

The  edge-fo  is  part  of  (he  Stanford  Hand-Eye  System.  I,  was  coded 

■ainiy  Py  Kar I  Pi„gle  IPingfe  *  TenenPau,  1971).  It  uses  T.nenhaum’s 
accommodation  routines  [Tenenbaum  1370]  and  the  powerful  edge-detecting 
operator  created  by  llanfred  Hueckel  [Hueckel  1971  S  1973).  |  shan  n0, 

attempt  to  describe  the  operation  of  the  edge-follower  in  any  Out  the 
following  extremely  broad  terms. 

The  edge-operator  consists  of  a  variable  sics,  approximately  circular 
matrix  which,  applied  over  some  small  area  of  the  TV-raster,  utilizes  a 
number  of  elaborate  mathematical  functions  to  obtain  (basically)  the 
location  of  the  edge,  the  intensity  gradient  vector,  and  the  brightness 

difference.  Figure  7.2  shows  an  ideal  edge,  its  intensity  profile,  and 
the  resulting  operator  output. 


The  edge-extraction  is  performed  on  a  333*256  matrix  of  intensity 
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values,  each  of  which  has  4  or  G  bits  of  information.  A  single  TV-scan 

results  in  4  bits,  but  6  may  be  obtained  by  combining  several  scans  at 
different  intensity  ranges. 

The  edge- fo I  lower  makes  a  coarse  scan  over  the  picture  until  it  finds  an 
edge,  which  it  subsequently  tries  to  follow  until  a  closed  curve  is 
found.  It  contains  a  line-fitter  which  it  uses  to  obtain  some  idea  of 
the  locations  of  the  vertices  in  the  scene.  A  closer  edge-scan  may  then 
be  performed  in  some  area  around  each  of  those  vertices,  so  that  other 
lines  may  be  detected.  Since  this  may  sometimes  lose  (due  to  glare, 
shadows,  adverse  lighting  conditions,  etc)  there  is  another  mode 
available,  in  which  a  complete  scan  is  performed  on  the  inside  of  all 
closed  regions  found  previously.  The  program  accomodates  the 
sensitivity  of  the  TV-camera  as  it  proceeds,  so  as  to  be  able  to  see 
better  in  the  local  area  of  current  interest. 

Alternatively  we  may  work  on  stored  TV-matr i ces,  in  which  case 

accomodation  is  by  definition  impossible,  and  where  the  quality  of  the 

edge-drawing  becomes  lower  (as  a  rule),  even  if  6-bit  intensities  are 
used. 


Whatever  the  case  might  be,  as  the  next  step  in  the  processing  of  the 
Picture,  the  original  edge-data  is  transformed  and  sorted  before  we 
start  the  I ine-abstract ing  phase. 

The  transformation  replaces  each  edge-point  and  gradient  vector  by  an 
edge-pair  (see  Figure  7.3,  (a)),  so  that  the  direction  of  the  local 
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intensity-discontinuity  is  from  then  on  implicit  in  the  vector  formed  by 
each  pair. 

The  sorting  program  creates  a  linkage  among  the  edge-pairs  to  ensure 
that  each  edge-pair  ‘>s  in  the  proper  context  in  terms  of  closeness  to 
other  pairs,  ard  in  terms  of  the  angles  of  the  pair-vectors.  In  other 
words,  the  list  of  pairs  is  ordered  in  a  way  conducive  to  extracting  the 
best  possible  lines.  The  input  data  already  to  a  great  extent  conforms 
to  such  an  ordering,  but  it  is  not  satisfactory  in  areas  near  vertices, 
or  in  other  regions  with  complicated  patterns. 


7.2  ABSTRACTION  OF  INITIAL  LINES 

Looking  at  Figure  7.1,  our  (human)  vision  system  tends  to  abstract 
shapes  or  objects  from  the  data.  It  is  unclear  (to  me)  how  big  the 
chunks  of  abstracted  information  are,  but  it  seems  (judging  from  my  own 
experience)  that  we  intuitively  perceive  lines  where  the  picture 
contains  straight  arrays  of  edges,  and  that  the  patterns  of  those  lines 
are  interpreted  in  meaningful  ways. 

Be  that  as  it  may:  Line-extraction  is  the  first  point  on  the  agenda  for 
the  present  system. 

The  line-extracting  program  attempts  to  fit  lines  within  the  connected 
subsets  of  edge-pairs  resulting  from  the  sorting  phase,  and  it  uses  an 
exact  least  square  method  for  the  line-fit. 
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Figure  7.4 

The  I ine-extracting  algor i thm 


7.2 


Figure  7.4  gives  the  flow  of  the  line-finder.  A  couple  of  things  are 
uorth  noting  here. 

1.  New  edge-pairs  are  tested  for  closeness  to  current  line 
(before  least-square  fit)  and  rejected  (line  stopped)  if 
not  close  enough.  This  prevents  wrapping  around  corners, 
as  Figure  7.3  also  demonstrates  (part  (b) ) .  The  least- 
square  fit  itself,  for  long  lines,  is  not  sensitive  enough 
here. 

2.  Uhen  we  get  no  further  in  one  direction,  we  try  extending 
the  line  at  the  other  end,  in  the  same  kind  of  process. 

3.  Uhen  both  directions  are  exhausted,  we  try  merging  the 
present  line  with  the  previous  one,  iteratively,  before 
start ing  on  a  new  I ine. 

After  all  possible  lines  have  been  created,  we  finally  clean  up  the 
picture,  removing  lines  that  are  based  on  an  insufficient  number  of 
edge-pairs  (parameter),  and  shrinking  each  one  of  the  rest  of  the  lines 
by  an  amount  proportional  to  the  quantity  DEP  in  Figure  7.3,  limited  by 
an  amount  proportional  to  the  length  of  the  line.  This  is  done  in  order 
to  clean  up  around  the  vertices  as  much  as  possible  before  we 
investigate  the  totality  of  I  ine- i ntersect  ions  (next  subsection). 

Figure  7.5  shows  the  result  of  line-extraction  on  the  edge-data  in 
Figure  7.1. 
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tive  vertices 


This  concludes  the  preprocessing  of  the  scene.  The  story  continues  in 
the  section  on  parsing,  which  conveniently  follows  next. 
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8.0  THE  PARSING  PROCESS 


8.1  PARSING  STRATEGY 

The  word  "parse"  has  been  chosen  because  it  describes  verg  u„||  Mhat 
happens  in  this  process.  The  parser  works  iterative, g.  extracting  one 
object  a,  a  ti.,e,  each  tine  modifying  the  scene  Pg  re.oving  the  lines  o 
segnents  belonging  to  that  object.  "Object"  is  used  here  and  elsewhere 

for  "Object  projection".  The  input  to  the  parser  is  the  initial  line- 
draw i ng,  which  was  described  in  the  previous  section. 

The  diagra.  in  Figure  8.1  shows  the  flow  of  the  parsing  process.  The 

f.rst  two  blocks.  A  and  B.  bag  be  characterised  as  preprocessing  stage, 

for  each  iteration  within  the  parser.  Theg  are  described  in  the  two 
fol lowing  subsections. 


The  result  of  the  actions  in  biock  A  (described  in  Subsection  8.2)  is  a 
pointer  structure  which,  although  the  original  line-drawing  is 
unchanged,  gives  the  tentative  vertices  based  on  intersection  relations 
In  Figure  7.5  , Potto.,  that  pointer  structure  has  been  used  to  show  the 
tentative  linkage  of  the  lines.  Note  that  what  is  shown  in  the  figure 
are  the  connectivity  relationships,  using  weighted  vertex  coordinates. 
The  data-structure  is  described  in  the  appendix,  Subsection  14.1. 

The  tentative  topology  is  the  basis  for  the  next  step,  feature 
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extraction.  During  that  process,  links  3re  created  between  scene- 
elements  and  prototype-elements.  Those  links,  called  mapping  keys,  are 
investigated  by  the  parser  (one  by  one)  in  order  of  decreasing 
complexity  until  the  mapping  program  finds  a  complete  object  or  the 
I  inks  are  exhausted.  In  the  former  case  the  object  is  accepted 
directly,  otherwise  the  different  mappings  are  compared,  and  the  best 
one  is  chosen.  Complexity  of  a  feature  is  measured  in  terms  of  the 
number  of  lines  involved. 

The  mapping  routine  tries  to  find  as  good  a  match  as  possible,  given  an 
initial  scene-element  and  the  prototype  element  it  is  currently  assumed 
to  map  into.  On  return,  that  program  has  stored  the  best  match  (for 
that  key)  in  compacted  form  for  the  parser  to  study. 

The  parser  now  compares  it  to  the  best  mapping  it  has  found  so  far, 
updating  the  "best"-pointer  if  the  new  mapping  is  better,  otherwise  just 
stepping  the  map-storage  pointer.  Thus  all  mappings  are  remembered  at 
each  iteration  (but  not  between  iterations),  and  before  investigating  a 
new  key  it  is  easy  to  check  whether  that  key  has  already  (implicitly) 
been  used,  i.e.  if  that  line  in  the  scene  has  already  been  tried  for  the 
current  equivalence  class  and  prototype  combination. 

Subsection  8.4  discusses  object  evaluation  (and  isolation).  We  must  be 
able  to  decide  whether  one  partial  mapping  is  better  than  another,  in 
order  to  isolate  the  best  object.  As  the  diagram  shows  (Figure  8.1), 
object  isolation  takes  place  when  all  keys  have  been  investigated,  or  a 
complete  object  has  been  found.  Since  an  isolated  object  disappears 
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fron,  the  current  scene,  the  topology  may  subsequently  have  changed  in 
some  drastic  may.  and  that  necessitates  a  reiteration  of  the  parser 

preprocessing  routines.  But  before  that  we  reset  all  data-structures  to 
the  initial  line-drawing  state. 

So  we  may  note  that  as  far  as  the  parser  knows,  each  iteration  deals 
with  a  completely  new  scene.  The  program  does  not  remember  what  it  did 
before,  nor  does  it  use  its  stored  knowledge  of  previously  extracted 
objects.  It  does  not  worry  about  occlusions.  A  match  may  take  place 

even  if  it  means  that  the  object  will  partly  cross  over  other  elements 
of  the  current  scene. 

Such  information  could  be  utilized  to  come  extent  even  in  the  present 
system,  but  mould  be  fully  effective  only  in  the  context  of  a  complete, 
thr ee-d i mens i ona I  I y  based  vision  system. 

Figure  8.2,  Figure  8.3,  Figure  8.4,  and  Figure  8.5  shorn  the  results  of 
the  iterative  parsing  process  on  our  sample  scene. 

Figure  8.B  gives  the  collective  final  scene  with  no  elimination  of 
h i dden  I i nes. 

There  could  be  one  more  process  in  the  total  scheme,  namely  object 

completion,  the  idea  of  which  would  be  to  try  combining  (in  turn)  each 

of  the  isolated  objects  with  the  final  residual  line-drawing,  using  the 

matching  program,  in  order  to  determine  whether  some  partial  object  may 

be  completed  or  at  least  extended.  Section  10  is  devoted  entirely  to 
that  subject. 
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Non  we  proceed  with  three  subsections  dealing  with  blocks  A,  B,  and  C  in 
the  strategy  diagram.  Figure  8.1. 


S.2  FORMAT  I  ON  OF  TENTATIVE  VERTICES 

Since  the  entire  parse  depends  on  the  initial  mappings,  and  features  are 
based  on  end-vertex  ray-constellations  for  the  lines,  we  have  to  somehow 
obtain  a  tentative  topology  in  terms  of  linking  lines  together  in 
possible  vertices.  This  should  be  done  fairly  conservatively  in  order 
to  avoid  grouping  excessive  numbers  of  lines  together,  which  would 
complicate  the  task  of  matching,  besides  possibly  destroying 
recognizable  features.  On  the  other  hand  we  do  not  want  to  be  too 
conservative,  either,  for  similar  reasons. 

Thus  the  formation  of  tentative  vertices,  with  no  global  knowledge,  is 

of  a  great  deal  of  importance.  The  block  diagram  for  this  process  is 
shown  in  Figure  8.7. 

A  cut  stop  (point  B3  in  that  figure)  is  exemplified  in  part  (b)  of 
Figure  8.S,  and  consists  of  one  line  (extended)  running  into  another. 

If  one  of  the  cut-off  ends  is  short  enough,  a  vertex  could  be  formed 
here,  by  assuming  that  the  short  piece  may  be  ignored. 

Block  A,  the  formation  of  cross-ref erence  tables,  is  the  process  of 
mapping  the  relationships  between  the  lines  in  terms  of  intersections 
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A  : 


Form  intersection  cross-reference  tables 


B: 


2 

3 


4 

5 


Join  acceptable  extension-intersection 

(ai stances  OK,  no  obstructing  lines) 
using  restrictive  parameter  settings 

Same  as  I,  except  use  full  parameters 

Join  line-ends  with  small  cut  stops  if 
and  only  if  either  end  is  free,  giving 
preference  to  the  line  with  the  least 
extension,  if  both  are  eligible 

Same  as  3,  except  no  preference 

Extend  still  free  line-ends  into 
closest  vertices,  subject  to  various 
distance  criteria 

Join  closest  free  line-end  pairs,  using 
liberal  parameters  for  one  of  the  lines 


C: 


Iteratively , 
the  distance 


merge  pairs  of  vertices,  provided 
between  them  is  short  enough 


Figure  8.7 

Tentative  vertex  formation 
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8.2 

and  col,  insari, ies.  Thi  s  i  s  done  for  every  pai  r  o,  I  ines.  For  each  end 
of  each  hne  the  lolloping  in, or., a, ion  results  (refer  also  to  Figure 
8.8.  uhere  U  is  assumed  to  be  the  current  line  in  all  cases), 


XI.  Closest  extension-intersect 

sub  J€ 

8.8). 


iS, 


ion,  and  both  distance 
subject  to  acceptability  (L2,  rl,  r2  in  part  (a)  of  F 


igure 


X2.  Closes,  col  I  inear  line,  and  distance  (L3,  ro  in  part  la) 
Of  the  same  figure). 


X3.  Closest  stoppi 
(b) ) . 


ng  line,  and  distances  (L3,  rkl,  dl  in  part 


-hat  block  (A)  is  iterated  once  more,  in  order  to  find  next-bes, 
intersections  ,n  cases  uhere  the  best  ones  mere  subsequently  blocked,  as 
illustrated  in  Figure  8.8,  par,  ,„).  Line  L3  •„  first  associated  mi  ,h 
L5.  but  later  L4  is  found  to  block  that  intersection,  so  that  L3  is 
grouped  uith  L2  instead,  during  the  course  of  the  second  iteration. 

One  might  of  course  store  several  best  intersections  for  each  line, 

begin  uith.  My  previous  preprocessor  did  just  that  (Subsection  3.2). 

It  is  basically  a  question  of  space  versus  time.  The  present  scheme  uas 

chosen  because  ,1)  subsequent  blockings  are  no,  overly  frequent,  and  ,2) 

"the  best  feu"  may  be  blocked  as  uell,  so  tha,  the  extra  code  ie  still 
necessary. 
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Once  the  cross-reference  tables  exist,  vertex  formation  proceeds  in  two 
steps,  namely  temporary  vertex  linking,  and  vertex  merge.  The  temporary 
vertices  are  created  in  an  iterative  process  where  each  step  is 
conservative  in  relation  to  the  previous,  but  where  the  end-result  is  as 
liberal  as  possible  without  creating  confusion.  I  shall  briefly 

indicate  the  reason  for  each  of  the  six  passes,  using  the  examples  in 
Figure  S.8. 


Pass  1  and  pass  2  do  the  same  thing,  with  a  different  extension 
tolerance  level.  Part  (f)  in  Figure  8.£  demonstrates  why.  If  rl  is  an 
acceptable  extension  for  pass  2,  but  not  for  pass  1,  and  if  the  maximum 
vertex-merging  distance  is  less  than  rl,  the  two  vertices  in  (f)  are 
kept  separate,  as  seems  reasonable.  That  would  not  have  been  the  case 
with  only  one  pass  here,  since  Ll  would  link  to  L2. 

Pass  3  joins  L3  with  L2  in  (b) .  provided  the  cut.  d3,  is  small  enough, 
and  rl-  3  is  short  enough,  if  the  preference  clause  had  not  existed,  Ll 
would  have  been  joined  with  L3,  and  the  resulting  vertices  would  have 
looked  di f  ferent ly. 

Pass  4  would  have  joined  Ll  with  L3,  if  the  former  hadn’t  already  found 
L5,  and  the  latter  L2  (still  in  part  (b)). 

Pass  5  will  permit  L3  to  join  the  others  (Ll  and  L2)  in  (c),  which  it 
couldn’t  otherwise  have  done,  assuming  rl  and  r3  are  too  great. 

Pass  S,  finally,  allows  Ll  to  join  L2  (part  (e) ) ,  provided  rl2  is  in  the 
right  length-bracket,  but  will  generally  not  al low  Ll  to  join  L3. 


The  vertex  »erge  fusee  close  enough  vertices,  subject  to  some 
connec  t i v  i  ty  constraints.  A  ue ighted- least-square  method  (that  takes 
account  of  line-lengths)  is  used  in  computing  the  best  vertex 
coordinates  for  junctions  of  several  lines. 

'There  are  many  nays  to  peel  the  banana  Which  is  the  right  one? 

Fortunately  the  matching  program  is  clever  enough  to  be  able  to  handle 

the  consequences  of  most  of  the  unavoidable  mistakes  of  the  ignorant 
procedure  above. 


Finally,  let  me  once  more  stress  the  fact  that  the  formation  of 
tentative  vertices  (etc)  is  only  reflected  in  the  connectivity.  i.e. 
confined  to  the  pointer  structure  (Subsection  14.1).  and  that  the 
initial  line-drawing  is  in  no  sense  affected  by  this. 


8.3  FEATURE  EXTRACTION 


Assuming  the  program  in  the  previous  subsection  has  done  a  fair  job,  ue 

should  nou  be  able  to  establish  some  links  be.ueen  the  scene  and  thl 

prototypes,  in  other  uords,  recognize  certain  constellations  as  things 
we  have  seen  before. 


The  feature  extracting  procedure  is  absolutely  straight-forward. 
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The  line-features  are  extracted  and  compared  with  the  centrally  stored 

ones,  in  a  binary  search  through  that  ordered  list.  If  a  similarity  is 

found.  He  Store  that  identifier  (and  direction-flag)  with  the  data  for 
that  line. 

The  co"'l’ouncl  'enures  are  compared  to  central  storage  in  sinilar 
fashion,  but  as  a  hatch  is  found,  ue  update  that  pointer  structure  so 
that  the  Particular  CF.  besides  pointers  to  various  prototypes,  pill  non 
also  have  a  pointer  to  this  specific  instance  in  the  scene.  The  centra, 
feature  reference  structure  uas  shown  in  Figure  6.4. 

A  global  switch  enables  the  following  extension  in  the  feature 
extraction  phase  (for  messy  scenes).  Utilizing  the  concept  of  partial 
similarity  of  line-features,  an  unrecognizable  feature  may  be  listed  as 
a  potential  key.  provided  it  can  be  reconciled  to  some  centrally  stored 
feature  by  the  association  only,  or  dissociation  only,  of  one  or  more 
rays.  Thus  both  adding  and  deleting  rays  simultaneously  is  forbidden. 

since  such  keys  would  be  far  fetched.  Ue  also  want  to  keep  changes  as 
simple  as  possible. 


For  each  unrecognized  feature  the  central  list  is  checked,  in  order  of 
decreasing  complexity,  until  a  partial  similarity  is  found  (that 
conforms  to  the  rule  above),  or  the  list  is  exhausted.  Thus  the 
recorded  partial  key  refers  to  the  maximum  complexity  feature  which  is 
partially  similar  (and  reconcilable)  to  the  one  causing  trouble. 

Thts  scheme  may  seen,  arbitrary  (and  so  it  is!),  but  we  really  are  only 
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truing  to  provide  educated,  local iy  paced  guesses  at  this  point.  There 

IS  certain,a  an  ele”e">  of  randomness  inherent  in  ang  such  local  scheme. 
A  more  global lu  oriented,  context  conscious  scheme  mould  be  nice,  but  a 
bit  harder  to  design.  The  concept  of  the  super-feature,  discussed  in 
Section  12  (future  mork).  might  possiblu  come  in  useful  in  this  respect. 


There  are  almost  almags  enough  kegs  to  initiate  the  mapping  process, 
since  each  iteration  in  the  parser  simplifies  the  scene  and  the 

topology.  Again,  the  concept  of  partially  similar  features  is  utilized 
in  the  mapping  heuristic,  Section  9. 


8.4  OBJECT  EVALUATION  AND  ISOLATION 

Object  evaluation,  or  rather  mapping  evaluation,  is  the  procedure 

whereby  ue  assess  the  goodness  of  a  mapping.  Ue  need  to  do  this  in 

order  to  be  able  to  choose  between  different  partial  mappings  to  obtain 
the  best. 

Th,s  is  one  of  the  processes  that  mould  fare  exceedinglg  cell  from  being 

provided  access  to  all  the  good  things  inherent  in  3D  consciousness 
(depth,  occlusion,  ...). 


The  primary  evaluation  is  oasec,  on  the  following  points  (number  of  line 
or  rays  subject  to  absolute  as  well  as  relative  tests): 

(1E1.  Completely  mapped  lines  (both  ends). 

ME2.  Incompletely  mapped  lines  (rays) . 

ME3.  Complete  lines  present  in  the  scene. 

T1E4.  Rays  present  in  the  scene. 

NEE.  Partially  used  (cut  off)  scene-lines. 

ME6.  Inserted  lines  and  rays. 

I1E7.  LF-testable  lines- 

HE8.  Lines  passing  through  vertices. 

It  should  he  fairly  obvious  in  which  directions  (positive  or  negative) 
those  items  contribute  in  the  evaluation.  Ue  want  as  many  complete 
elements  as  possible,  and  we  prefer  that  they  really  exist  in  the  scene. 
Partial ly  used  scene-lines  are  abhorred,  since  they  may  be  indicative  of 
an  object  cut  off  to  fit  the  mapping.  An  object  (complete)  is  never 
accepted  "on  faith"  if  it  contains  such  lines.  Ue  shall  see  examples  of 
these  and  other  exotic  things  later. 
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Fin6"“-  n°‘  a"  C°"ple,e  lin8s  *">  LF-festable,  that  is.  so„a  of  the. 
.ay  contain  so  cal  led  "assumed"  rays  at  their  end-vertices.  Those  are 

rays  for  ohich  no  direction  can  be  pinpointed  -  only  their  existence  is 
known  to  be  a  fact. 


Besides  the  pri.ary  evaluation  points,  „e  use  preference  relations  to 
decide  betneen  equally  uel I  .apped  objects.  The  preference  tso.enhat 
arbitrarily!  calls  for  parallelepipeds  rather  than  .edges,  non¬ 
degenerate  rather  than  degenerate  vieus,  for  instance.  This  is  not 
crucially  important,  since  the  particulars  of  every  mapping  are 
remembered,  and  a  post-evaluation  phase  could  be  constructed,  .here 
questionable  matches  could  be  further  invast igated. 

Furthermore,  as  I  have  been  pointing  out.  the  system  presented  here  is 
not  intended  as  a  complete  vision  system.  Its  possible  role  in  such  a 
system  is  discussed  in  Section  12  (future  work). 

OBJECT  ISOLATION  is  simply  the  removal,  from  the  active  scene,  of  all 

l-nes  belonging  to  the  current  object.  The  general  data-structure 

allows  parts  of  the  scene  to  become  part  of  the  •'subconscious"  (see 

appendix,  Subsection  14.1),  and  information  regarding  each  isolated 

object  is  grouped  into  three  different  areas  of  the  subconscious, 
namely: 


SL1.  F 


inal  mapping,  existing  lines. 


SL^.  Final  mapping,  inserted  lines. 
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SL3.  Line- segments  belonging  to  the  object,  but  superceded  by 
inserted  lines. 

Ue  will  get  back  to  this  subject  in  connection  with  the  discussion  of  a 
possible  object  completion  phase  (Section  10). 

Some  of  the  items  in  this  subsection  may  seem  slightly  undefined  at  this 
stage,  but  everything  will  become  clear  as  we  now  proceed  to  the  tale  of 
the  matching  process. 
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9.0  prototype  matching 


9.1  STRATEGY  OUTLINE 

The  process  of  (partially)  mapping  a  prototype  onto  elements  of  the 
scene  is  crucial  to  this  vision  system.  That  task  is  by  no  means 
trivial  -  the  task  of  documenting  it  isn’t  either.  The  process  is  a 
fairly  complex  recursive  one,  which  I  will  do  my  best  to  describe 

clearly  and  concisely,  using  hierarchies  of  diagrams  with  pertinent 
comments  in  the  text. 

The  general  idea  is  as  follows: 

Assume  we  have  an  initial  correspondence  between  one  directed  line  in 
the  scene  and  one  directed  line  in  a  prototype  (I  use  "line"  here,  even 
though  it  is  not  endowed  with  coordinates). 

Ue  now  try  to  establish  correspondences  between  the  rays  emanating  from 
the  ends  of  those  two  lines.  If  we  are  lucky,  those  scene-rays  are  all 
m  their  proper  places.  However,  realistically,  they  very  often  are 
not.  There  may  be  too  few.  or  too  many,  or  (even  if  the  numbers  agree) 
they  may  not  be  pointing  in  the  right  directions.  Different 
alternatives  must  then  be  investigated,  and  the  line-features  are  used 
as  prediction  guides  in  this  context. 

Another  problem  is  the  fact  that  lines  may  have  been  broken  up, 
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sometimes  equipped  with  false  but  plausible-looking  vertices,  sometimes 
with  pieces  missing.  Other  lines  just  die  in  the  middle  of  nowhere,  in 
which  case  we  must  attempt  to  get  to  them  from  elsewhere  in  the 
topology. 

Thus  the  matching  process  is  of  necessity  recursive,  since  in  general  we 
have  to  investigate  the  consequences  of  alternate  choices,  recursively, 
before  finding  an  optimal  mapping.  As  a  first  approximation,  each 
mapped  vertex  defines  a  recursive  level.  If  that  mapping  is  successful, 
we  then  try  another  line-end  in  that  extended  context,  bumping  the 
level.  Ideally  this  carries  on  until  the  prototype  has  been  matched. 

In  practice  we  may  come  to  a  grinding  halt  for  many  reasons,  a  few  of 
which  are: 

Two  unfusable  scene-lines  (or  two  different  vertices)  are  put  into  an 
identity  relation  by  consequence  of  the  mapping. 

A  line-feature  does  not  check. 

Two  lines  extended  to  a  vertex  intersect  in  a  topologically  impossible 
place. 

A  line  is  too  long  or  too  short. 

Error  conditions  will  be  described  in  detail  later,  suffice  it  to  say 
here  that  we  meet  with  conditions  that  necessitate  recursive  back-up. 
Backing  up  to  level  R,  we  then  investigate  the  next  choice  at  that 
level.  All  levels  accounted  for,  the  tree  may  become  quite  large,  but 
it  is  kept  down  to  size  by  the  use  of  features,  as  well  as  parailelity 
and  length  classes,  for  screening  purposes.  We  shall  see  this  later. 
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Thus,  given  *he  initial  mapping  between  two  lines,  the  idea  is  to  work 
doun  the  topology  of  the  prototype,  matching  those  elements  with  scene- 
elements  until  a  complete  match  is  found  or  the  recursive  process  is 
exhausted,  in  which  case  the  result  is  a  partially  mapped  object.  In 

that  process  there  is  a  maximizing  mechanism,  ensuring  that  we  exit  with 
the  best  such  partial  match. 

We  do  not  try  to  work  from  several  different  vantage-points  at  once, 
e.g.  trying  to  link  up  several  individually  recognized  features  or 
regions  into  a  partial  or  complete  object,  although  that  might  be  a 
possibility  to  be  used  in  conjunction  with  the  present  scheme. 

Especially  with  good  initial  line-drawings. 

The  recursion  has  been  programmed  explicitly  (as  opposed  to  using 
recursive  procedures),  for  several  reasons.  We  only  have  to  save  a  very 
limited  amount  of  information  at  each  level.  We  should  like  to  be  able 
to  have  access  to  all  stages  of  recursion  at  once,  and  to  be  able  to 
easily  back  up  to  any  desired  level.  We  also  save  time  and  space. 

The  first  diagram.  Figure  9.1,  illustrates  what  has  been  said  above.  It 
is  simplified  in  the  extreme,  and  in  this  section  we  shall  proceed  to 
clarify  the  specifics  of  the  process,  giving  diagrams  for  each  building 

block  by  itself  (as  far  as  possible),  such  as  back-up,  line-merge,  ray- 
i den t i f i cat  ion  ... 

First,  bouever,  i;e  shall  describe  the  napping  data-structures  and  deal 
nith  the  concept  of  partial ly  similar  line-features,  uhich  is  used  for 
purposes  of  hypothesis  formation  regarding  new  vertices. 
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Figure  9.1 

Simplified  matching  strategy 
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9.2  DATA-STRUCTURES 

As  a  rule  I  do  not  like  to  burden  the  presentation  with  details,  but  in 
this  case  they  serve  the  honourable  purpose  of  clarifying  the  rest  of 

the  section,  and  making  it  easier  to  describe.  For  undefined  terms,  see 
Sect i on  2. 

The  first  structure  is  the  template,  that  is  the  expanded  prototype 
topology  structure,  see  Figure  9.2.  For  each  line,  LENDV  names  end- 
vertices,  LENDP  names  orbital  successor  lines,  PARCLA  contains  the 
parallelity  classification,  and  LENCAT  the  length  categorization  within 

that  class.  There  is  also  storage  for  the  physical  entities  associated 
with  those  categories. 

Figure  9.2  also  provides  an  example  to  illustrate  this.  The  lines  are 
ordered  the  same  as  their  LF:s,  as  has  already  been  pointed  out.  Since 
the  general  data-structure  is  organized  similarly  (see  Subsection  14.1), 
it  is  easy  to  search  the  topologies  of  prototype  and  scene  in  parallel, 
setting  up  and  checking  correspondences. 

The  length-class  information  is  used  throughout  the  matching  process, 
for  discriminatory  purposes,  allowing  for  some  tolerances,  of  course. 

In  cases  where  the  prototype  indicates  perspective  deformations,  those 
tolerances  are  more  liberal,  in  the  proper  directions. 

Figure  9.3  shows  some  of  the  structure  used  for  recursion. 

HAP0RD  is  a  vector  containing  (in  the  timewise  order  of  mapping)  the 
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Expanded  prototype  structure 
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Figure  9.3 

Recursion  ciata-structure 
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prototype  lines  referenced  so  far.  The  corresponding  scene-elements  are 
stored  in  PLHAP  etc. 

MPORDS  is  indexed  on  recursive  level,  and  contains  pointers  to  the 

currently  referenced  HAPORD  entries,  at  each  level.  As  we  shall  see, 
those  pointers  may  cross. 

MAPI S,  also  indexed  on  level,  contains  pointers  to  the  last  MAPORD  items 
created  at  the  different  levels. 

Thus  the  main  mapping  alternatives  are  listed  in  HAPORD,  but  there  are 
usually  several  possibilities  for  each  one  of  those  entries.  I  feel  I 
should  clarify  one  thing  already,  namely  that  a  new  HAPORD  entry  is 
created  if  and  only  if  a  previously  unreferenced  prototype  line  is 
encountered  in  orbiting  a  vertex.  Furthermore,  a  HAPORD  entry 
constitutes  a  mapping  alternative  if  and  only  if  that  P-line  is  unmapped 

at  one  end,  i.e.  that  end-vertex  has  not  been  orbited  (modulo  recursive 
back-up) . 

In  the  same  diagram,  Figure  3.3,  we  demonstrate  how  mappings  are 
recorded. 

PLHAP  contains,  for  each  prototype  line  end,  the  corresponding  scene 
e  I  ement. 

LLEV,  indexed  in  parallel,  contains  the  recursive  level  at  which  that 
mapping  took  place. 

PLHAPO  and  LLEVO  are  1-1  eve  I  push-down  stacks  for  PLHAP  and  LLEV,  used 
when  a  line  is  being  replaced  (in  the  creation  of  a  new  vertex,  or 
connecting  two  vertices),  as  will  be  explained  later. 

PVHAP  and  VLEV  store  vertex  information. 
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The  final  item  in  this  subsection  deals  with  the  line-fusion  mechanism. 
Fusions  deal  with  colli  near i t i es,  and  only  take  place  for  base-lines 
(every  vertex  mapping  has  a  base-line,  the  flAPORD  entry). 

If  we  want  to  explore  a  new  branch  at  some  level,  we  check  (iteratively) 
whether  the  present  base-line  may  be  extended.  If  that  happens  to  be 
so,  that  alternative  is  tried  and  the  fact  is  recorded  for  the  P-line 
end,  at  which  the  fusion  took  place. 

LFUSE  is  a  stack  (G  levels),  for  each  P-line  end,  containing  packed 
pointers  (a  sixpack  indeed)  into  a  common  area. 

LFUSES  is  that  common  area,  where  each  word  supplies  enough  information 
(about  the  fusion  of  two  lines)  to  enable  proper  back-up,  if  and  when 
necessary. 

At  a  fusion,  which  is  always  based  on  existing  col  linearity  links,  a  new 
compound  line  is  created  and  linked  into  the  data-structure,  while  the 
constituents  are  shoved  into  the  subconscious.  Note  that  the  fusion  and 
line  replacement  mechanisms  use  different  stacks,  and  are  therefore 
i  ndependent. 

Ue  shall  now  turn  to  a  discussion  of  partially  similar  line-features. 
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9.3  PARTIALLY  SIMILAR  LINE -FEATURES 


Since  the  line-feature  is  used  in  checking  ali  lines  and  their  end 
constellations,  it  seemed  a  natural  thing  to  use  it  also  in  determining 

what  was  wrong  (and  what  should  be  done  by  way  of  correction)  if  the 
check  was  negative. 


The  input  to  this  algorithm  consists  of  two  feature  words,  and  two 
direction  bits.  The  first  feature  word  is  used  as  a  template.  The 
second  is  the  one  to  be  checked,  and  for  which  (if  necessary  and 
possible)  corrections  should  be  indicated.  For  reasons  of  sanity  we 
require  that  one  end  of  the  second  feature  be  OK  (this  is  checked,  of 
course).  The  requirement  is  also  a  realistic  one  as  regards  the  way  in 
which  the  model  is  traversed,  one  line-end  at  a  time,  such  that  the 
other  end  is  mapped  previously. 

I  shall  give  no  details  of  the  program  here  -  it  is  straight-forward  - 
only  the  format  of  the  modification  word  (MODIF) ,  which  is  one  of  the 
outputs,  and  a  few  examples.  All  of  that  in  Figure  9.4.  Orbits  start 
from  the  base-line,  and  the  rays  are  referenced  in  that  order. 

As  "bare"  we  define  a  vertex  with  as  many  insertions  as  there  are  rays 
(excepting  the  base-line).  The  entire  MODIF  word  (part  (a)  in  the 
figure)  is  defined  as  "ambiguous'  if  and  only  if  there  is  an  ambiguous 

ray  position  somewhere,  i.e.  e.g.  if  we  do  not  know  which  of  several 
rays  to  delete. 

Part  (b)  gives  the  template  for  cases  (c) ,  (d) ,  and  (e),  in  which  the 
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(a) 


Special  bit*  List  of  action-elements  (2  bit*  for  each  ray 
(first  2)  at  the  vertex  subject  to  modification) 


\ 

Codes 

Codes 

00 

Unambiguous,  not  bare 

00 

No  change 

01 

Unambiguous  bare 

01 

Insert  ray  here 

10 

Ambiguous,  not  bare 

TO 

Delete  this  ray 

11 

Ambiguous,  bare 

11 

Ambiguous 

Figure  9.4 

Line-feature  modification  word  -  HOD IF 
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lower  vertices  are  the  ones  to  be  compared  and  corrected.  The  resulting 
HODIF  words  are  given  in  the  figure  as  well.  Of  course,  if  there  is  no 
change,  HODIF  is  set  to  zero. 

The  HODIF  word  is  subsequently  deployed  as  a  template  for  the  mapping  of 
rays  at  the  vertex  currently  being  investigated,  as  we  shall  see  later. 

The  following  subsection  describes  an  extension  of  the  concept  treated 


9. A  LF  MODIFICATION  RECONCILIATION 

This  idea  arose  due  to  the  fact  that  a  vertex  may  sometimes  be  ambiguous 
from  the  direction  of  one  line,  but  unambiguous  from  another.  Figure 
9.5  shows  an  example  of  this. 


The  prototype  context  is  given  in  (a),  and  the  scene  in  (b) .  The  vertex 
under  investigation  is  of  course  V,  at  tho  intersection  of  LI  and  L2. 
Suppose  we  are  dealing  with  the  line  LI,  in  an  effort  to  map'  the  rays  of 
V.  The  feature  template  (c),  and  the  scene-feature  (d),  illustrate  the 
situation.  The  ray  L4  is  easily  seen  to  be  superfluous,  but  L5,  L3,  and 
Lz  are  all  converging  with  L8,  so  the  program,  knowing  that  at  least  two 
of  them  have  to  be  deleted  but  not  knowing  which  two,  marks  all  three  as 
ambiguous.  However,  if  L2  were  the  base  line  (f),  and  LP2  the  template 
(e),  then  the  situation  is  unambiguous  (based  on  the  para  I  lei i ty  bit). 
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We  introduce  the  following  definition: 


RECONCILIATION  of  a  feature  modification  word  from  one  line  to  another 
(at  the  same  vertex)  is  the  process  of  rearranging  the  information  of 

that  (TOO IF  word  so  as  to  make  it  applicable  from  the  vantage  point  of 
the  second  line. 

In  our  case,  reconciliation  of  flODIF  from  L2  to  LI  is  a  uay  to 

disa«oi0uate  nODIF  nf  1.1,  using  MODIF  of  L2  and  the  connectivity  of  the 
vertex. 

The  following  algorithm  is  used: 

RECONCILIATION  ALGORITHM. 

Assume  the  MOOIF  word  is  tH.Al.A2.A3,  ...,An,00 . 00],  .here  the  A:s 

stand  for  2-bit  action  items,  and  M  for  the  two  characteristic  bits,  in 

this  case  [00],  since  we  are  only  interested  in  reconciling  useful 
MOOIF: s. 

Let  PL1  and  SCL1  be  the  template  and  scene  element  for  the  MOOIF  word, 

uhich  is  to  be  reconciled  to  PL2  and  SCL2.  We  then  define  the  following 
quant i t i es: 

DP  a  Orbital  distance  from  PL1  to  PL2. 

DSC  =  Orbital  distance  from  SCL1  to  SCL2. 
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DL  «■  Number  of  action  elements  indicating  "leave",  [00],  up 
to  and  including  "present"  element  in  MODIF. 


DI 

=  Same 

for  "insert", 

[01]. 

DD 

=  Same 

for  "delete", 

110]. 

He  then  look  for  that  action  element  of  H0DIF,  for  which 

□P  -  DI  +  DL 

That  element  must  be  a  "leave",  [00],  and  ue  also  must  have 

DSC  =  00  +  DL 


This  ensures  identity  of  the  second  line  in  prototype,  scene,  and  HOD  IF 
word,  so  that  the  reconciliation  to  that  line  may  take  place.  The  new 
MOD  I F  word  will  then  have  the  following  format,  assuming  the  action 
element  found  above  is  Ak; 


MOD  I F (rec) 


[H,Ak+l,  . . .  ,An, Ak, Al, 


. . . , Ak-1 , 00 , 


00]. 


In  our  example,  Figure  9.5,  MODIF  for  L2  is  reconciled  to  LI  thus: 
DP  =  DSC  =  DL  =  1  DI  =  DD  =  0 
MODIF (old)  =  [00,00,10,10,01,10,  ...] 

MODIF  (rec)  =  100,18,10,01,10,00,  ...] 


So  we  get  the  correct  indication  of  the  need  for  an  inserted  ray 

parallel  to  17,  while  L3,  L4,  and  L5  are  ail  branded  as  non-conformists 
and  el iminated. 
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9.5  MORE  GENERALITIES  -  EXAMPLE 

The  initial  mapping  references  one  scene-line,  and  it  is  assumed  that 
the  rays  emanating  from  that  original  line  all  map  into  the 
corresponding  prototype  elements.  Of  course  we  know  that  the  line- 

features  agree,  by  definition.  Thus  the  first  three  recursive  levels 
are: 

Levi.  The  original  line,  provided  by  the  key. 

Lev2.  The  first  end-vertex,  and  its  rays. 

Lev3.  The  other  end-vertex,  and  rays. 

Ue  never  allow  recursive  back-up  to  reach  level  3,  or  below.  Once 
established,  those  mappings  remain  fixed.  The  reason  for  this  is  that 
levels  1,  2,  and  3  all  refer  to  mappings  directly  involving  the  key,  and 
it  would  not  make  sense  to  provide  back-up  past  that  stage. 

Ue  note  here  that  the  initial  mapping  is  always  given  as  above, 
regardless  whether  the  key  was  provided  by  a  line-feature  or  a  compound 
feature.  In  the  latter  case,  the  first  line  referenced  by  the  feature 

is  used.  The  other  line  will  certainly  be  mapped  later,  since  those 
features  of  object  and  prototype  are  in  agreement. 

There  is  one  inequity  here  -  which  I  hasten  to  admit  before  being  found 
out  -  and  that  is  the  fact  that  the  second  line  of  a  CF,  being  mapped  at 
level  2  or  3  at  one  end  but  4  or  more  at  the  other,  is  subjected  to 
recursive  back-up  (and  thus  to  extension,  for  example).  If  ue  really 
wanted  to  push  things  we  should  also  (when  the  results  so  indicate) 
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investigate  the  case  of  having  the  second  line  as  the  original.  This 
would  only  very  rarely  make  a  difference,  however,  and  so  is  not  worth 
the  extra  computing. 

Thiu  is  especially  so  since,  if  the  parsing  process  reaches  the  state  of 
dealing  with  L--mappings  as  well,  this  second  alternative  wi i I  be 
investigated  (but  the  first  one,  of  course,  will  not  be  repeated). 

There  are  two  basic  phases  in  the  recursive  process.  The  first  phase 
exhausts  the  LF-consi stent  (base-lines)  mappings  ("F-mappings")  -  does 
not  accept  any  others  -  and  thus  branches  out  over  the  most  dependably 
mapped  part  of  the  topology.  The  second  phase  deals  with  more  difficult 
mappings,  utilizing  partial  feature  similarity  and  reconciliation. 

During  the  second  phase  we  may  get  back  into  elements  mapped  during  the 
first,  due  to  recursive  back-up,  by  this  time  they  are  not  treated  as 

special.  The  second-phase  mappings  are  called  consequence  mappings  ("C- 
mapp i ngs" ) . 

In  order  to  make  these  things  a  bit  clearer  I  have  provided  a  simple  and 
complete  example  of  the  typical  actions  of  the  mapping  process.  It  does 
not  contain  any  of  the  more  exotic  pathologies,  only  one  partially 
missing  line  and  a  couple  of  superfluous  ones.  It  does  not  necessitate 
recursive  back-up.  Figure  9.6  gives  prototype  (a)  and  scene  (b) ,  and 
also  illustrates  various  stages  of  the  mapping.  The  table  below 
demonstrates  the  order  of  the  mapping  process  for  this  example. 

Quantities  in  parentheses  refer  to  the  scene,  others  to  the  prototype. 
First  and  second  ray  refer  to  elements  being  referenced  for  the  first 
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time.  This  is  the  order  in  which  they  are  introduced  into  HAPORD 
(Figure  9.3) . 

Table  9.1 

Order  of  exemplified  mapping 


LEVEL 

1 

ORBITED  VERTEX 

FIRST  RAV 

PL2  (L7) 

SECOND  RAY 

2 

PV3  (VG) 

PL 3  (L10) 

3 

PV2  (V4) 

PL8  (L8) 

PL1  (LG) 

4 

PV4  (V7) 

PL 4  (Lll) 

PL9  (LS) 

5 

PV1  (V3) 

PLG  (L4) 

G 

PV5  (VI) 

PL 5  (LI) 

7 

8 

PV7  (VS> 

PVG  (V2) 

PL 7  (IR1) 

Here  are  some  comments: 

Levels  1,  2,  and  3  constitute  the  initial  mapping,  (cl  in  the  fig. 

Levels  4,  S.  and  6  represent  additional  F-mappings,  almost  completing 
the  object,  (d). 

At  level  7  (part  (e) ) ,  partial  similarity  (VS  of  L8  and  PV7  of  PL8)  was 
used,  discarding  L9  and  inserting  the  tentative  ray  IR1  (0f  unit 
length),  based  on  the  para  I  lei i ty  class  of  LG  and  Lll,  IR1  is  then 
linked  (one  way)  with  L12,  since  they  are  found  to  be  coll  inear. 

At  level  8  (part  (f)),  finally,  partial  similarity  for  L4  and  PL6  is 
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used  to  get  rid  of  L2  and  L3.  Finding  that  the  other  end  of  PL 7  is 
mapped  into  1R1.  and  using  the  co 1 1  inear  i ty.  ue  deoide  that  the  .tapping 
Of  PL7  ts  OK.  ue  insert  the  compound  line  IL1,  and  find  that  the  object 
ts  non  complete.  A I II i nes  are  LF-testab I e  and  OK. 

Note  that  if  partial  similarity  hadn't  been  able  to  make  sense  of  the 

Situation  at  V2>  that  vertex  Mould  have  been  oonstruoted  basical  ly  as  an 

i  ntersect  i  on  of  LI  and  L4.  Then  1L!  Mould  have  been  inserted  as  a 
replacement  for  1R1. 

The  next  subsection  gives  a  fairly  detailed  account  of  the  recursive 
process. 


9.6  THE  RECURSIVE  PROCESS 


The  presentation  Mill  center  around  six  diagrams.  The  first  provides 
the  main  flou.  The  second  deals  uith  vertex  orbitiog  and  the  third  ui  th 
nay  mapping.  The  next  tuo  explain  the  function,  of  erasure  and  back-up. 
The  last  diagram  gives  a  detailed  account  of  the  routine  taking  care  of 
back-up.  Messy  though  some  of  then,  may  seem,  the  flouoharts  only  record 
the  main  actions  or  branches.  Minute  detail  is  of  course  unnecessary 
for  the  purposes  of  this  presentation. 

Figure  8.7  is  the  main  flow  diagram.  Simply  stated,  study  "the  next 
available  alternative",  that  Is.  the  next  un-orbited  P-line  end  in  the 
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UmPORD-  iSt.  Tms  end  may  already  be  flagged  as  baling  up.  which  means 
that  it  is  either  an  inserted  ray  or  that  we  have  investigated  it 
before,  and  are  now  left  with  the  final  alternative,  namely  regarding 
the  line  as  a  ray.  In  that  case  we  see  if  there  is  enough  information 
in  the  prototype  topology  to  determine  an  intersection  consequence 
vertex  ("INCOV").  Thus,  in  part  (e)  of  Figure  9.8,  the  vertex  V2  would 
be  an  INCOV  of  LI  and  L4,  since  their  prototype  counterparts,  PL4  and 
PL5,  are  linked  at  PV5  (part  (a)). 

Otherwise  we  study  the  vertex,  using  the  MODI F  word  (Figure  9.4)  to 
decide  what  actions  to  take.  If  the  vertex  is  ambiguous  or  the  base¬ 
line  would  be  a  ray  before  insertions,  we  look  for  an  extension  of  the 
line.  The  diagram  should  explain  most  of  this.  The  "pre-orbit  scan" 
simply  finds  out  whether  the  vertex  is  mapped  by  consequence  of  two 
lines,  as  above,  which  influences  the  branching. 

The  diagrams  in  Figure  9.8  and  Figure  9.9  demonstrate  the  process  of 
mapping  a  vertex-constellation  (orbiting  a  vertex).  The  diagram  should 
be  more  or  less  self-explanatory.  The  heart  of  it  is  the  referencing  of 
one  ray-position  at  a  time,  and  the  NODIF-based  action  decision  at  that 
point.  When  the  vertex  has  been  orbited  we  check  the  bareness  again  (a 
ray  may  have  been  replaced  on  the  basis  of  its  angular  argument),  and  we 
demand  that  the  finalized  vertex  contain  at  least  one  scene-ray  besides 
the  base-line.  If  that  is  the  case,  we  then  check  that  all  new  two-way 
mapped  lines  are  LF-consi stent. 

Figure  9.10  explains  the  labels  ERASE  and  BU  in  the  previous  diagrams, 
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and  shops  hou  the  process  ends,  if  a  complete  object  hasn't  been  found 
before  then. 


The  final  diagram  is  in  Figure  9.11,  and  it  shows  the  deletion  (back-up) 
of  the  actions  at  some  recursive  level,  and  of  associated  information. 
The  back-up  program  has  mechanisms  taking  care  of  co II i near i t i es,  that 
is,  of  fusions  and  un-fusions  of  base-lines.  If  a  consequence  vertex  is 
found,  this  routine  backs  up  one  more  level,  since  it  would  be  no  use 
trying  it  again  from  another  direction.  This  is  the  case  also  for  a 


negative  ray,  for  which  an  INCOV  must  have  been  attempted  at  some  point 
in  time. 


It  should  be  clear (er)  now,  how  the  recursive  process  makes  use  of 
prototype  topology  as  well  as  line-feature  information  and  equality 
classes  (etc.)  to  provide  guidance,  in  order  to  avoid  much  superfluous 
work,  to  direct  back-up,  and  so  forth. 

The  following  section  (a  much  shorter  one)  presents  object  completion, 
which  could  be  a  final  process  in  this  intermediate  level  system. 
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10.0  OBJECT  COMPLETION 


THE  BASIC  IDEA: 


The  idea  behind  this  phase  is  the  folding.  Supppse  the  final  scene 
(hith  all  happed  objects  removed)  still  contains  some  Unas,  and  that 
so„e  of  the  isolated  objects  are  only  partial  iy  napped.  I,  is  logical 
in  that  situation  to  check  and  see  uhether  some  partial  (s)  might  „ot  be 
extended,  or  even  completed,  using  those  remaining  scene-elements. 

The  concept  is  a  very  simple  one,  and  so  is  the  execution.  Me  revive 
each  partial  lin  order  of  decreasing  complexity!  and  look  for  extensions 
and  intersections  o,  its  rags  (one-uay  mapped  lines).  Those  are  then 
tested  in  a  process  similar  to  the  original  mapping  process. 

The  uay  this  is  dons,  practically,  is  simply  as  follous. 
first  «  make  sure  the  physical  properties  of  the  lines  belonging  to  an 
object  reflect  the  topology  of  that  object.  That  is,  the  vertices  are 
recomputed,  and  the  line-coordinates  are  changed  accordingly.  This  is 
done  for  all  objects,  uhether  completely  or  partially  mapped. 

Folloumg  this,  the  actual  completion  phase  begins,  and  proceeds  thus. 

The  partial  is  brought  out  from  the  subconscious,  and  a  neu  cross- 
reference  and  tentative-vertex  evaluation  is  performed,  this  time  uith 
more  liberal  parameters,  for  instance  al  louing  first  intersect  i  ons  of 
pairs  of  unlinked  lines  regardless  of  distances.  There  is  one  important 
reservation  here,  namely  that  ue  do  not  alloy  extra  lines  to  join  the 
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fu"U  "apped  ver,icBS  of  the  object.  Ue  way  safely  allow  relaxation  of 
parameters  at  this  point,  for  several  reasons: 


The  partial  will  be  put  together  as  before,  except  possibly  for  new 
elements  being  linked  to  incompletely  mapped  parts  of  the  object. 

The  scene  is  uncomplicated,  having  only  comparatively  feu  lines. 

He  are  not  dependent  on  the  scene  for  a  mapping  key. 

The  mapping  process  ensures  correct  topology  and  feature  consistency 


A  new  mapping  is  only  attempted  if  there  are  changes  in  the  connectivity 
of  the  object  (due  to  the  new  cross-reference  pass).  Using  a  fully 
mapped  line  for  the  key,  ue  call  the  mapping  program,  which  returns  with 
the  best  partial  mapping,  according  to  that  new  structure.  This  partial 
is  at  least  as  good  as  the  original  mapping,  since  the  original  will 
have  been  encountered  during  the  matching.  He  compute  the  new  or 

amended  vertices,  adjust  the  lines,  and  ship  the  object  back  into  the 
subconscious. 

This  process  continues,  with  the  next  partial,  until  either  the  scene  or 
the  subconscious  is  exhausted. 

UHY  NOT? 


When  this  section  was  first  written  I  had  only  done  some  preliminary 
work  on  implementing  object  completion.  Having  spent  some  more  time 

thinking  about  these  things,  I  decided  that  it  might  all  be  a  bad  idea. 
Let  me  explain. . . 
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First,  experiments  with  man.;  scenes  have  rarely  produced  cases  where 
9"Ch  a  schefne  would  contribute  to  the  performance  of  the  system. 

Secondly,  it  may  well  happen  that  spurious,  irrelevant  lines  are 

absorbed  into  partial  mappings,  since  linkages  are  less  strictly 
requ i red. 

Thirdly,  the  elaborate  heuristics  for  formation  of  tentative  vertices, 

as  well  as  the  scheme  for  using  partially  similar  features  as  keys,  both 

contribute  towards  obviating  the  need  for  a  specific  object  completion 
pass. 


The  last  reason  -  a  matter  of  policy  -  is  that  we  do  not  strive  to 
arrive  at  complete  interpretations  at  any  cost.  If  the  scene  is 


ambiguous  or  otherwise  too  difficult,  we  must  rely  on  an  extended  rcheme 
(such  as  proposed  in  Section  12)  for  further  processing.  Object 
completion  belongs  in  that  context,  utilizing  obstruction-,  support-, 
and  depth  relationships.  The  present  recording  of  the  mappings 
(constituting  scene  interpretations)  should  prove  well  suited  to  the 


requirements  of  such  extended  schemes. 
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POST-PROCESSING 

In  order  to  show  interpreted  scenes  more  clearly,  and  to  demonstrate  the 
power  of  a  knowledge-directed  scheme,  I  could  have  added  a  hidden-line 
elimination  phase.  That  process  would  be  based  on  obstruction 
relationships,  to  be  provided  manually,  lacking  30  knowledge  and  a 

support  theory.  Those  concepts  are  no  longer  " intermediate-level ",  and 
one  has  to  stop  somewhere  ... 

The  elimination  of  hidden  lines  or  line-segments  could  be  very 
straightforward.  Basically,  each  line  of  an  obstructed  object  would  be 
intersected  with  the  outlines  of  all  obstructing  ones,  keeping  the 
unobstructed  segments. 

I  have  resisted  this  temptation  to  produce  good-looking  final  drawings, 
partly  because  I  have  had  better  things  to  do  with  my  time,  but  mainly 
because  such  a  program  would  not  serve  a  useful  purpose  within  the 
frame-work  of  the  present  system. 

The  only  post-processing  presently  in  this  system  is  the  completion  of 
fully  mapped  parts  of  objects,  according  to  the  topologies  of  their 
respective  matching  prototypes. 

By  way  of  clarifying  the  concepts  presented  so  far,  and  demonstrating 
the  abilities  (and  weaknesses)  of  this  intermediate-level  vision  system, 
we  now  give  some  typical  examples  of  scenes  and  their  analysis. 
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11.0  EXAMPLES  -  RESULTS  -  DISCUSSIONS 


11.1  COMMENTS 

Firs,  some  general  comments.  A,  I  examples  system  performance  in  ,h, 

report  repreeent  scenes  of  uniformly  coloured  (uhitish)  objects,  uhich 

are  not  unrealistically  Ideal,  in  asmuch  as  they  are  fairlu  beat-up 

Having  been  manhandled  (and  kicked  around?,  bu  „a„y  peopie  since  they 
were  made  [Falk  1270], 

Tbe  scene  background  is  a, nags  a  black  cloth  covering  the  table-top. 
(Feel  free  to  regard  this  as  cheating!) 

in  Host  o,  the  examples  normal  office-type  lighting  (over-head,  diffuse) 

nas  used,  Chemise  the  auxiliary  (diffused,  light  sources  surroundiog 

(He  "Hand-Eye  Taole".  Needless  to  sag  none  o,  the  examples  have  been  in 

•He  leas,  edited,  nor  are  theg  a  non-tgpical,  selective  sakple  of  scenes 

<na.  uork  especial, g  uell.  They  also  all  use  standard  parameter 

settings.  Finally,  soke  scenes  were  created  bg  people  other  than 
myse I f . 


THe  pattern  of  presentation  of  ang  given  ex-„p|e  is  the  |ogica| 
succession  starting  „i,h  the  TV-i.age,  going  through  pre-processing 
ldop,„g  through  object  kapping  and  isolation  (shaping  isolated  object 
and  amended  scene  each  time,,  and  finally  presenting  the  interpreted 
scene  as  a  conglo.erate  o,  partial  Ig  or  fully  mapped  objects. 
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The  examples  are  commented  as  needed,  to  focus  attention  on  points  of 
particular  interest  or  typicality.  The  section  will  end  with  a 

discussion  of  results  and  system  performance,  treating  fortes  ae  well  as 
shortcomings. 


11.2  EXAMPLES  OF  SYSTEM  PERFORMANCE 

The  example  that  has  been  presented  in  parallel  with  this  unfolding 
story  provides  instances  of  shadows,  broken  lines,  occluded  vertices, 
missing  lines,  double  lines,  and  a  split  object.  The  initial  line¬ 
drawing  and  the  final  scene  interpretation  are  reproduced  here  for 
convenience  (Figure  11.1). 

There  were  good  keys  into  all  of  the  objects,  and  the  matching  program 

Waa  able  t0  find  complete  mappings  in  all  cases.  We  note  how  essential 
the  line-fusion  heuristic  was  here,  in  establishing  the  lower  vertices 
of  the  large  body,  which  had  keys  only  to  the  top  part.  The  same 

heuristics  were  instrumental  in  finding  the  long  horizontal  object,  and 
the  object  in  front. 

The  wedge  presented  no  great  problems,  only  extrapolated  vertices.  Note 
also  how  double  lines  are  removed  with  an  object  when  they  may  be 

assumed  to  be  caused  by  it  (judging  by  closeness  and  parallel  ity  to 
object  I ines) . 
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The  next  scene  (Figure  11.2,  Figure  11.3.  Figure  11.4,  Figure  11.5. 
F’igure  11.6,  Figure  11. 7, and  Figure  11.8)  has  a  shadow  in  front  of  the 
small  cube,  which  makes  that  dimension  hard  to  determine,  and  the  final 
object  is  slightly  distorted  here.  The  glare-line  at  the  left  end  of 
the  wedge  presented  no  problem  -  the  feature-guided  fusion  mechanism 
continued  past  it  in  search  of  the  right  kind  of  vertex.  The  line  could 
not  be  assumed  to  be  a  part  of  the  final  object,  and  it  was  left  as 
garbage. 

The  scene  is  finally  parsed  correctly,  with  some  distortion, 


The  third  scene  presented  here  has  one  area  completely  messed  up  by  both 
shadows  and  glare,  namely  the  left  side  of  the  right-hand 
parallelepiped.  Both  the  TV-picture  (Figure  11.9)  and  the  edge-  and 
line-drawings  (Figure  11.10)  as  well  as  the  tentative  vertex 
connectivity  (Figure  ;  1 . 1 1 )  show  the  effects  thereof. 

The  shadow  on  top  of  that  object,  caused  by  the  reclining  beam,  gives 
rise  to  a  very  specific  problem  in  the  extraction  of  the  latter  object 
(Figure  11.12).  The  lower  right-hand  vertex  of  that  beam  does  not  get 
connected  properly,  only  through  a  short  segment  in-between.  That 
vertex,  then,  is  found  by  the  matching  heuristics  as  an  intersection 
consequence  vertex,  defined  by  an  inserted  ray  (from  the  top)  and  an 
existing  ray  (from  the  right).  At  that  point,  the  feature  template 
demands  an  inserted  ray.  pointing  to  the  left.  That  ray  is  found  to  be 
col  I  inear  (and  is  therfore  linked!  with  the  bottom  line  of  the  beam,  and 
that  object  may  finally  be  completed. 

After  easily  extracting  the  wedge  (Figure  11.13),  the  remaining  object 
is  the  shadowed  parallelepiped.  The  center  vertex  is  established  by  two 
existing  rays,  but  the  three  vertices  on  the  left  outline  of  the  object 
are  hypothesized  on  the  basis  of  the  intersection-consequence  heuristic, 
using  inserted  rays  when  necessary.  This  is  shown  in  Figure  11.14. 

Hence  this  scene  is  finally  interpreted  correctly  (Figure  11.15), 
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Figure  11.11 

SCI 2:  Tentative  vertices  -  First 
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The  fourth  scene  is  slightly  more  complex,  in  that  it  contains  five 
objects  (TV- i mage  in  Figure  11. IS,  edges  and  initial  lines  in  Figure 

11.17) ,  but  it  presents  no  difficulties  we  haven’t  encountered  in 
previous  examples,  including  coinc idental  I ine-ver tex  al ignments  (Figure 

11.18) ,  which  are  a  source  of  sadistic  delight  to  the  template-driven 
matcher. 

The  small  cube  is  first  to  go,  an  easy  match.  There  are  also  very  good 
keys  into  the  large  wedge,  which  follows  next  (Figure  11.18),  The  top 
parallelepiped  is  then  severed  from  the  cube  and  the  small  wedge, 
completed  and  isolated,  taking  the  double  line  with  it  (Figure  11.20). 

The  wedge  is  the  next  object  to  be  extracted  (Figure  11.21),  with  ample 
usdge  of  fusion-,  insertion-,  and  intersection-consequence  heuristics. 

The  amended  scene  (Figure  11,22,  top)  shows  a  cube  with  two  false 
vertices  (top-left),  which  are  however  easily  discarded  by  the  feature- 
template,  para  I  I e I i ty-c I  ass,  and  length-class  heuristics,  so  that  the 
cube  may  be  extracted  in  perfect  shape. 

The  shadow-lines  are  left  as  garbage,  as  shown  in  Figure  11.23,  which 
also  presents  all  of  the  objects  superimposed. 
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Figure  11,20 

SC3:  Amended  scene  -  Third  object 
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The  fifth  example  (TV-image  in  Figure  11.24)  looks  misleadingly  simple, 
at  least  compared  with  our  previous  examples.  However,  there  are  a 
couple  of  subtle  problems  involved  in  parsing  this  scene.  Looking  at 
the  edge-drawing  and  the  initial  line-drawing  in  Figure  11.25,  we  note 
the  presence  of  a  short  line-segment  at  the  lowest  vertex  o'  the  large 
wedge,  and  that  the  long  bottom  line  of  that  body  is  not  connected  at 
its  right  end.  This  gives  rise  to  a  tentative,  somewhat  narrowed  wedge 
which  is  however  discarded  in  favour  of  the  correct  one. 

Figu-e  11. 2G  shows  the  first  match,  a  parallelepiped. 

The  correct  match  for  the  second  object  (the  wedge  mentioned  above), 
shown  in  Figure  11.27,  uses  the  long  bottom  line  rather  than  the  short 
segment.  The  connected  drawing  (top  of  Figure  11.26)  indicates  why 

those  two  alternatives  are  investigated  (connectivity  of  left  lower 
ver  tex) . 

It  may  be  interesting  to  see  some  of  the  contenders  for  this  second 
object,  and  I  have  included  two  figures  containing  alternative  (but  not 
as  good)  matches,  namely  Figure  11.28  and  Figure  11.23.  The  first  of 
those  contains  the  narrow  wedge  I  just  mentioned,  the  second  (top)  a 
partial  wedge  with  the  triangular  face  on  the  right. 

Now  there  is  only  one  thing  left  in  the  scene,  a  wedge.  The  edge¬ 
drawing  (Figure  11.25)  clearly  shows  that  one  short  interior  line- 
segment  is  misdirected.  This  state  of  affairs  gives  rise  to  an 

alternative,  shorter  wedge,  which  is  eventually  discarded  for  the  better 
match  shown  in  Figure  11.30. 
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Figure  11.30 

SC2:  Residual  scene  -  Third  objec 
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The  final  amended  scene,  and  the  object  super imposi t ion,  are  given  in 
Figure  11.31.  The  normal  imperfections  did  not  cause  much  trouble  in 
this  scene.  They  are  left  as  impossible. 
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The  next  example  is  included  because  it  demonstrates  the  potential 
usefulness  of  features  of  connection  independent  elements,  such  as 
constellations  of  parallel  lines.  Figure  11.32  shows  this  scene,  which 
contains  a  concave  object,  namely  an  L-beam.  So  this  example  also 
indicates  how  concave  objects  may  be  thought  of  as  consisting  of  severa 
recognisable  parts,  which  is  discussed  in  Section  12. 

Figure  11.33  and  Figure  11.34  should  need  no  comments  by  now. 

The  case  of  the  L-beam  is  more  interesting.  Figure  11.35  demonstrates 
how  the  program  deals  with  this  situation.  It  finds  a  parallelepiped  at 
the  bottom,  thereby  splitting  the  object  into  two.  It  preferred  the 

longer  version  of  that  PAREP  to  the  shorter  one,  due  tc  the  perfect 
out  line  at  bottom  left. 

Anyhow,  in  the  amended  scene  (top  of  Figure  11.36)  I  can  clearly  see  a 
parallelepiped.  The  program  could  not.  The  reason  is  that  there  is  not 
one  good  vertex  around,  which  might  provide  a  starting  point.  Here  is 
where  global  features,  based  on  vertex  independent  line  constellations, 
would  have  been  most  useful.  It  is  easy  to  see  how,  for  instance,  a 

parallel  i  ty  feature  might  have  provided  a  key  into  the  mapping  of  this 
object. 

If  the  top  of  the  L-beam  had  been  found,  the  latter  object  would  have 
been  neatly  and  autonnt ical ly  split  into  two  recognizable  parts.  In 

general,  concave  objects  would  necessitate  more  special  treatment,  as 
discussed  in  Section  12. 


209 


IPJ4U.IH  *"‘U  WWW M-H  um  JWJL  W  MW.U.  &&& 


11.2 

The  fo I  lowing  scene  -  the  seventh  presented  here  -  is  the  most 
complicated  one,  in  terms  of  the  number  of  objects.  It  is  also 

difficult  due  to  the  small  overall  scale.  Figure  11.37  shows  this 
scene. 

In  the  line-drawing  (bottom  of  Figure  11.38)  even  I  (though  a  human)  do 
not  know  exactly  what  is  going  on,  since  there  are  many  lines  missing  in 
strategic  places,  as  well  as  unwanted  ones  present  in  other  places.  It 
is  very  noticeable  in  this  example  how  the  parsing  strategy  of  isolating 
one  object  at  a  time  (and  removing  the  lines  belonging  to  it)  has  the 
effect  of  cleaning  up  the  picture,  thereby  facilitating  subsequent  work. 

The  first  object  is  extracted  without  difficulty  (Figure  11.39),  however 
uith  some  distortion  due  to  glare  effects. 

The  extraction  of  the  second  object  (Figure  11.40)  does  not  present 
anything  new  and  exciting,  either. 

The  case  of  the  parallelepiped  resting  on  the  wedges  is  more 
interesting.  Here  the  matcher  initially  finds  a  much  shorter  PAREP  (I 
think  you  can  easily  see  where),  which  is  eventually  discarded  when 
further  investigation  yields  the  longer  (and  better)  one,  shown  in 

Figure  11.41.  The  bottom  right  line  is  not  assimilated  into  this  object 
due  to  a  slight  distortion  of  the  front  face. 

In  Figure  11.42  the  wedge  on  the  right  is  found.  Not  trivial  -  but  we 
have  seen  similar  examples  previously. 
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That  is  the  case  with  the  tall,  upright  parallelepiped  as  Mell,  uhich  „ 
the  next  object  to  be  isolated  (Figure  11.43). 

Finallg  the  progra,  finds  tup  small,  partially  mapped  paral le'epipeds. 
as  shoun  in  Figure  11. 44  and  Figure  11.45.  There  is  no  uag  to  tell 
uhere  theg  end,  and  !  couldn't  have  done  better  myself,  on  this  scene. 

The  resulting  interpretation  is  presented  in  Figure  11.46. 


Figure  11.45 
I  scene  -  Seventh  obj 
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The  last  example  does  not  contain  as  many  objects  as  the  previous  one, 
but  it  is  far  messier  in  terms  of  shadows  and  glare  (Figure  11.47) .  jn 

C  f3Ct’  °ne  °f  the  objects  (the  uedge  on  the  left)  could  not  have  been 

recognized  without  the  use  of  partially  similar  features  as  keys. 

The  uedge  on  the  right  is  the  first  to  go  (Figure  11.48).  There  isn’t 
much  to  say  about  that. 

The  second  object  (Figure  11.49)  is  identified  somehou  (it  is  herd  to 

see  hon  the  lines  are  linked  at  the  loot  tor.) .  as  a  uedge.  This  seems 

plausible  enough,  judging  from  the  evidence  at  the  top.  I  happen  to 

knou  that  the  object  uas  a  parallelepiped  -  put  that  is  beside  the 
point. 

J  ", 

Figure  11.50  shows  the  extraction  of  the  tall,  narrow  parallelepiped. 

Due  to  the  nay  (he  lines  are  linked  at  the  top  of  that  object  (cf.  top 
of  Figure  11.48)  the  program  uses  the  short  line  for  the  left  side.  The 
longer  neighbour  is  close  enough  to  be  assimilated  into  the  object.  The 
bottom  vertex  is  extrapolated,  leaving  one  line  unused.  The  mapping  is 

good  enough  for  acceptance  at  this  point,  and  the  program  exits  without 
investigating  further. 

Next  to  go  is  the  big  parallelepiped  (Figure  11.51).  No  mean  trick  - 
but  not  much  different  from  things  we  have  seen  before. 

The  isolation  of  the  wedge  (Figure  11.52)  is  more  interesting,  since 
that  object  contains  no  recognizable  features.  The  heuristic  for  using 


??7 


scene  -  Fourth  object 


11.2 


partially  Similar  features  as  keys  is  responsible  for  the  success  in 
this  case,  by  determining  that  it  could  create  recognizable  features  by 
disregarding  the  shadow  line  at  the  lower  vertex  of  the  wedge.  The 
object  can  then  be  extracted  without  difficulty. 

The  amended  scene  (Figure  11.53)  is  messy  enough  to  provide  the  parser 
with  one  more  possible  object,  which  is  found  in  the  shadow  effect  on 
the  front  of  the  central  object  (cf.  Figure  11.47).  It  finds  the 
parallelogram  of  a  degenerate  wedge,  the  missing  two  lines  of  which  are 
assumed,  non-directional  rays.  The  present  program  has  no  way  of 
knowing  its  mistake,  whereas  a  complete  system  (using  depth  etc.)  could 
better  realize  the  nature  of  the  situation. 

Thus  the  resulting  scene  interpretation  in  Figure  11.54  contains  that 
non-object  as  well,  which. is  basically  all  right  from  the  standpoint  of 
the  present  system.  Left  in  the  residual  scene  are  the  rest  of  the 
shadow-  and  glare  lines,  a  messy  lot  which  did  not  mislead  the  program. 

This  concludes  the  presentation  of  examples  of  system  performance. 
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11.3  DISCUSSION  OF  SYSTEM  PERFORMANCE 

Actually,  not  much  of  a  discussion  should  be  needed  here,  since  the 
examples  are  thoroughly  commented. 

On  the  scenes  tested  so  far.  the  present  limited  system  has  performed  as 
well  as  could  be  hoped.  It  is  able  to  parse  scenes  of  many  objects,  in 
the  presence  of  a  good  deal  of  disturbance.  In  fact  the  utilization  of 
partially  similar  features  as  keys  makes  it  possible  to  correctly 
identify  objects  with  only  one  good  vertex,  provided  one  of  the  lines  to 
that  vertex  is  unbroken. 

Sometimes  partially  mapped  objects  are  classified  somewhat  haphazardly, 
but  their  classifications  are  not  intended  as  final.  A  complete  system 
could  further  process  them,  since  the  details  of  their  mappings  are 
remembered. 

The  CPU-times  for  the  examples  given  above  range  from  1  to  6  minutes, 
typically  staying  around  2.  The  time  is  proportional  to  the  square  of 
the  number  of  lines,  and  roughly  to  the  squares  of  the  numbers  of 
prototypes  and  partially  mapped  objects  (since  full  matches  cause  quick 
exits).  The  dependence  of  computing  time  on  the  square  of  the 
complexity  of  the  picture  is  a  weakness  inherent  in  a  system  based  on 
models.  It  might  be  alleviated  by  the  use  of  more  extensive  feature 
schemes,  as  indicated  in  Section  12  (future  work), 

Little  effort  has  been  made  in  the  direction  of  speeding  up  the  program. 
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It  could  fairly  easily  be  modified  to  run  substantially  faster,  by 
programming  frequently  used  routines  directly  in  assembly  language, 
rather  than  in  the  Algol  subset  of  SAIL  (approximately  Algol,  plus 
associative  features)  ISwinehart  &  Sproull  19711. 

Let  us  turn  now  to  a  discussion  of  the  possible  directions  in  which  work 
on  the  present  system  might  proceed. 
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12.0  FUTURE  POSSIBILITIES 

The  most  immediate  areas  of  possible  future  work  concern  extensions  of 
the  present  20  system.  More  general  aspects  involve  the  use  of  concepts 
of  30  in  the  development  of  a  complete  vision  system. 


12.1  EXTENSIONS  OF  THE  FEATURE  CONCEPTS 

The  feature  concept,  as  implemented  here,  has  the  weakness  of  demanding 
connectivity.  It  is  certainly  possible  -  and  might  even  be  worth-while 
to  extend  it  to  certain  constellations  of  unconnected  lines,  such  as 
parallel  pairs  or  triplets,  and  relationships  of  such.  The  example  of 
the  L-beam  demonstrates  the  potential  advantages  of  connectivity 
independent  features. 

Such  features  would  be  useful  as  guides  for  a  matching  supervisor 
program,  in  that  they  could  provide  an  extended  context  in  some  cases. 

Of  course  they  might  also  be  helpful  in  guiding  the  process  of  initial 
mapping. 

Two  immediate,  more  specific  possibilities  for  extensions  of  the  feature 
concept  are  the  following. 

It  often  happens  that  lines  are  broken  up  by  intervening  objects,  or  for 
other  reasons.  Host  likely  the  line-features  of  the  parts  will  not  be 
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recognizable.  A  future  possibility  here  is  to  detect  such  cases  and 

I y  insert  conpound  lines,  under  the  provision  that  the  created 
LFts  are  recognizable.  The  relative  nsssiness  of  such  a  schente 
motivates  a  "wait-and-see"  attitude  here. 

The  second  scheme  I  had  in  mind  concerns  the  introduction  of  the  SF, 
which  here  stands  for  "super-feature"  (not  San  Francisco).  In  a  super¬ 
feature.  which  may  be  an  extension  of  the  LF  concept,  we  would  provide 
(partial)  information  regarding  the  LF-designations  of  al I  the 
participating  lines  in  the  feature.  Such  features  would  then  reference, 
directly  or  indirectly,  all  lines  in  simple  prototypes,  providing  wider’ 
contexts  and  extremely  strong  clues  to  mappings.  Of  course,  line- 
drawings  of  scenes  are  usually  messy  enough  that  complete  SF:s  would  be 

rare.  He  would  almost  always  have  to  use  partial  ones,  which  is  all 
right. 


In  any  case,  SFts  could  provide  initial  mappings  (kegs),  based  on  „uch 
broader  contents  than  do  the  present  LFts  and  CFi s.  In  fact,  SFts  could 
conceivablg  guide  the  parsing  process,  providing  the  order  in  uhich  to 
exp  the  keys.  I  see  no  use  for  SFts  in  the  hatching  process  itself. 
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The  present  parsing  program  decides  between  different  mappings  on  the 
basis  of  the  lines  present  in  the  initial  line-drawing.  In  the  context 
of  the  Stanford  Hand-Eye  Project  it  is  fully  possible  to  base  such 
judgments  on  data  from  the  original  TV- image,  since  that  system  includes 
a  statistically  based  I  i  ne-ver  i  f  i  er  that  operates  on  the  digitized  TV- 
raster  tTenenbaum  19703 . 

While  on  the  subject  of  having  recourse  to  the  TV- i mage  itself,  I  should 
mention  the  possibility  of  a  "c loser- I ook"  strategy.  This  would  apply 
in  areas  of  insufficient  or  confusing  information,  and  would  entail 
sensitivity  accomodation  as  well  as  (and  perhaps  especially)  changing 
the  lens  into  one  of  greater  magnification.  Details  regarding  the 
technicalities  of  related  subjects  may  be  found  in  (Sobel  1970], 


12.3  EXTENDED  CONTEXTS  AND  3D 

The  most  interesting  possibilities  arise  in  the  extended  context  of  3D. 

In  a  full-fledged  three-dimensional  I  y  based  system,  with  access  to 
depth-information,  the  basic  prototypes  would  be  given  by  (fictitious) 
coordinates  in  space,  and  the  final  scene  i nterpretat ion  would  be  based 
on  spatial  considerations,  supper t-theorems,  etc. 
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Ficjure  12.1  presents  a  diagram  suggesting  the  flow  of  information  in 
such  a  system. 

The  feature  scheme,  and  the  2D  prototype  matching  scheme,  could  be  much 
the  same  as  now.  The  2D  prototypes  mould  be  generated  automatically 
from  models  in  3-space.  The  prototype  analyzer  mould  generate  all 
different  views  of  an  object,  checking  each  one  against  existing  20 
prototypes,  and  updating  that  memory  structure  whenever  necessary. 

Handy  programs  for  the  creation  and  manipulation  of  3D  scene- 

representations  exist  already  (Baumgart  1973],  and  those  should  prove 
most  useful  in  such  contexts. 

The  prototype  matching  would  proceed  more  or  less  as  it  does  at  present, 
but  the  decisions  of  acceptance  and  interpretation  would  now,  at  least 
m  doubtful  cases,  be  the  responsibility  of  the  3D  parsing  supervisor, 
with  judgments  based  on  information  and  theory  not  available  to  the 
present  parser,  as  well  as  on  the  specific  details  of  the  current  world- 
model,  which  describes  what  the  environment  is  expected  to  be  like  and 
what  kinds  of  objects  make  up  the  world. 

Depth-information  may  of  course  be  obtained  directly,  using  the  laser. 
Another  alternative  -  well  suited  to  the  present  system  -  would  be  to 
use  stereo  correlation,  i.e.  work  in  parallel  on  two  different  views, 
separated  by  an  adequate  angle  (from  the  point  of  view  of  depth- 
separat i on) . 

Since  the  2D  parser  isolates  one  object  at  a  time  ("best  first", 
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basically),  the  task  of  identifying  and  correlating  objects  betueen  the 
two  views  should  not  turn  out  to  be  excessively  hard. 


12.4  EXTENSION  TO  GENERAL  PLANAR  FACED  OBJECTS 

Roberts  (cf.  Subsection  3.1)  introduced  the  idea  of  representing  non— 
convex  bodies  as  composed  of  two  or  more  convex  parts.  This  certainly 
seems  like  a  very  sound  approach,  especially  in  a  model-based  system, 
where  self-occlusions  would  otherwise  create  great  difficulties  and 
vastly  increase  the  required  numbers  of  two-dimensional  prototypes. 

Uhat  is  needed,  then,  is  a  method  of  describing  the  junctions  of  convex 
objects  into  more  complex  ones,  so  that  the  parser,  having  found  the 
parts,  may  infer  the  whole  (in  some  representation).  The  representation 
of  a  concave  body  as  a  collection  of  convex  parts  is  at  best  a  highly 
ambiguous  undertaking,  which  requires  rigorous  conventions  on  the  part 
of  the  prototype  analyzer,  and  a  great  deal  of  flexibility  on  the  part 
of  the  parser. 

It  would  seem  that  any  meaningful  extension  to  non-convex  objects  would 
have  to  take  place  in  the  context  of  3D,  and  would  probably  require 
verification  loops  accessing  the  TV-image,  since  it  might  otherwise  be 
hard  to  determine  whether  we  are  looking  at  one  object  adjacent  to 
another,  or  just  at  one  single,  more  complex  body. 
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13.0  CONCLUSIONS 

A  system  for  i  n termed i ate- I  eve !  computer  vision  has  been  developed, 
which  utilizes  ylobal  information,  in  the  form  of  two-dimensional 
models,  in  interpreting  an  image  as  a  representat i on  of  a  three- 
dimensional  scene.  The  world  is  assumed  limited  to  planar  faced,  convex 
objects. 

System  performance  seems  most  satisfactory.  For  scenes  of  regularly 
shaped  objects,  such  as  our  clear  old  parallelepipeds  and  wedges,  the 
present  system  shows  good  discriminatory  power,  even  under  adverse 
conditions,  as  in  the  presence  of  disturbances  like  shadows,  glare,  and 
missing  lines. 

The  system  presented  here  was  created  with  the  extended  context  of 
three-dimensional  interpretations  in  mind,  and  it  should  prove  quite 
readily  adaptable  for  use  in  a  complete  vision  system. 
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14.0  APPENDIX 

It  was  original  I y  my  intention  to  include  the  mathematics  of  least- 
square  line-fit  and  vertex  merging  by  a  weighted-least-square  method 
here,  as  well  as  the  edge-sorting  algorithm,  col  linearity  criteria,  etc. 

However,  this  paper  is  long  enough  as  it  is,  and  I  don’t  want  to  burden 
it  with  extra  details,  unless  unusual  or  otherwise  interesting,  which  to 
my  mind  precludes  the  above-mentioned,  i  shall  be  content  to  give  some 
account  of  the  basic  data-structure  of  the  present  system. 


14.1  THE  GENERAL  DATA-STRUCTURE 

This  presentation  is  intended  to  provide  some  general  principles  rather 
than  implementation  details.  It  will  not  deal  with  the  data-structures 
pertinent  to  features,  prototypes,  or  mappings,  since  those  were 
discussed  in  their  proper  contexts. 

Some  of  the  important  considerations  behind  the  design  of  the  data- 
structure  were: 

Easy  random  access 
List  structure  for  context 
An  absolute  minimum  of  shuffling 
Ability  to  expand  if  needed 
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It  was  designed  some  time  before  I  developed  the  present  feature-  and 
prototype  schemes.  It  is  of  a  general-purpose  character  (within  its 
frame-work) ,  and  has  proven  efficient  and  fairly  easy  to  work  with. 

The  scene-data  is  classified  into  three  basic  groups  of  pertinence, 
namely  lines,  line-ends,  and  vertices. 

The  information  pertaining  to  lines  is  of  a  more  or  less  physical 
nature,  such  as  coordinates,  coefficients  in  equation,  angular  argument, 
basis  in  edge-data,  etc.  A  very  important  item  associated  with  each 

line  is  its  LCREDE  (Line  CREation  and  DEIetion)  value,  to  which  I  shall 
return  below. 

Associated  with  each  line  is  also  the  linkage  of  its  ends.  Those  are 
referred  to  as  SV: 5  (simple  vertices),  and  they  figure  mainly  in  the 
context  of  the  list  structure  providing  vertex  linkages.  Thus,  for  each 
SV,  we  have  a  pointer  to  its  orbital  successor  line(-end),  and  the  angle 
to  that  line,  ccw.  around  the  vertex  in  question. 

Normal  vertices,  where  several  lines  come  together,  are  called  CV:s 
(compound  vertices),  and  each  SV  also  has  a  pointer  to  the  CV  (if  any), 
of  which  it  is  a  member.  Risking  confusion,  I  hesitate  to  add.  that  CV:s 
may  be  'ingle,  as  well.  Uith  each  CV  is  associated  a  pointer  to  one  of 
the  SV;s  in  its  ring,  and  also  physical  coordinates,  which  are  obtained 
through  a  weigh  ted- 1  east-square  method  I  developed  (which,  incidentally, 
can  be  used  to  obtain  perspective  vanishing  points,  as  well),  and  which 
minimizes  the  squares  of  the  distances  from  the  lines  (xo  the  point), 
weighted  by  the  square  roots  of  the  line-lengths. 
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To  sum  up,  SV:s  are  line-ends,  and  CV:s  are  vertices,  and  those 
structures  are  completely  separate.  The  linkages  define  the 
interpretation  of  the  scene-representat  i  on. 

Therefore,  each  element,  be  it  a  line,  an  SV,  or  a  CV,  has  a  fixed 
amount  of  storage  associated  with  it,  which  makes  its  components 
directly  addressable,  besides  in  some  cases  being  members  of  list 
structures.  Furthermore,  deleted  elements  are  linked  into  free-storage 
lists,  so  that  no  shuffling  is  needed  except  when  core  has  to  be 
expanded.  This  happens  when  the  information  content  (or  the  messiness) 
of  a  scene  exceeds  expected  bounds. 


14.2  THE  SUBCONSCIOUS 

The  above-mentioned  LCREOE  defines  the  status  of  a  line,  i.e.  whether  it 
is  part  of  free  storage  (no  line  at  all),  inactivated,  or  currently 
active.  It  also  contains  a  short,  two-level,  memory  of  recent  states. 

Actually,  the  top  of  LCREDE  only  defines  status  in  relation  to  two 
global  variables  which  define  the  current  range  of  the  conscious.  By 
changing  those  global  values,  we  may  forget  parts  of  the  scene,  and 
bring  other  parts  to  the  surface. 

As  an  added  possibility  for  diversity  (confusion),  we  may  have  vertex 
connections  temporary  or  permanent,  as  defined  by  the  signs  of  the  SV~ 
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painters  (orbit-pointers).  This  has  not  yet  been  utilised  in  the 
present  system,  where  all  links  are  temporarily  permanent. 

( I  beg  your  pardon. . ?) 

There  is  a  vast  library  of  subroutines  that  perform  various  exotic 
actions  on  The  scene-representation,  and  those  (when  called  properly) 
can  be  instructed  to  work  only  in  the  subconscious,  or  the  conscious,  or 
the  temporarily  subconscious,  or  . . .  For  the  present,  most  routines  are 

instructed  to  work  in  the  temporarily  permanent  conscious,  that  is  with 
active  I ines  only. 

The  two  globals  defining  the  current  range  of  the  conscious  are 
manipulated  by  the  parser,  the  matcher,  and  various  other  programs,  in 
the  processing  of  a  scene.  Since  each  l.CREDE  is  a  short  memory  stack, 
lines  may  be  temporarily  forgotten  (by  pushing  down),  or  conveniently 
recalled  (by  popping  the  stack).  This  possibility  is  used  extensively 
in  the  matcher,  who  is  very  busy  replacing  lines  or  line-pairs  in  cases 
of  col  I  inear i ties  or  plain  substitutions. 

I  have  found  this  system  very  flexible  and  efficient,  especially  in  the 
context  of  parsing  and  matching,  where  the  scene  (or  the  current  object) 
is  subject  to  continual  change. 

This  should  be  enough. 
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"COMPUTER  RECOGNITION  OF  THREE-DIMENSIONAL  OBJECTS 
IN  A  VISUAL  SCENE" 

MAC-TR-59 
MI T  Project  MAC 
December  1968 


HUECKEL  M 

"AN  OPERATOR  WHICH  LOCATES  EDGES 
IN  DIGITIZED  PICTURES" 

Journal  of  the  ACM  (pp.  113-125) 
January  1971 
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HUECKEL  n 

A  LOCAL  OPERATOR  WHICH  RECOGNIZES  EOGES  AND  LINES" 
Journal  of  the  ACM 

To  be  pub  I i shed  1973 
HUFFMAN  D  A 

"LOGICAL  ANALYSIS  OF  PICTURES  OF  POLYHEDRA" 

AI  Group.  Technical  Note  No.  6 
SRI  Project  7494 
May  19G9 

ORBAN  R 

"REMOVING  SHADOWS  IN  A  SCENE" 

AI  Memo  192 
MIT  AI  Laboratory 
August  1970 


PINGLE  K  K  &  TENENBAUil  J  M 
"AN  ACCOMMODATING  EDGE  FOLLOWER" 
Proc.  IJCAI 
September  1971 


ROBERTS  L  G 

"MACHINE  PERCEPTION  OF  THREE-DIMENSIONAL  SOLIDS" 
Technical  Report  No.  315 
MI T  Lincoln  Laboratory 
May  1963  (reissued  May  1965) 
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SHIRA!  V 

"A  HETERARCHICAL  PROGRAM  FOR  RECOGNITION 
OF  POLYHEDRA" 

A I  Memo  No.  2G3 
MIT  AI  Laboratory 
June  1972 

SOBEL  I 

"CAMERA  MODELS  AND  MACHINE  PERCEPTION" 
Stanford  AI  Memo  AIM-121 
Stanford  Uni versi ty 
May  1970 

SUINEHART  DCS  SPROULL  R  F 
"SAIL" 

Stanford  A!  Project  Operating  Note  No.  57.2 
Stanford  University 
January  1971 

TENENBAUM  J  M 

"ACCOMMODATION  IN  COMPUTER  VISION" 

Stanford  AI  Memo  AIM-134 
Stanford  Uni versi ty 
October  1970 


15.0 


UNDERWOOD  S  A  &  COATES  C  L 

"VISUAL  LEARNING  AND  RECOGNITION  BY  COMPUTER" 

Technical  Report  No.  123 

Information  Systems  Research  Laboratory 

Electronics  Research  Center 

University  of  Texas  at  Austin 

Apr i I  1972 


WALTZ  0  L 

"GENERATING  SEMANTIC  DESCRIPTIONS  FROM 
DRAWINGS  OF  SCENES  WITH  SHADOWS" 

AI  TR-271 

MIT  AI  Laboratory 

November  1972 


WINSTON  P  H 

"LEARNING  STRUCTURAL  DESCRIPTIONS 
FROM  EXAMPLES" 

MAC-TR-7G 
MIT  Project  MAC 
September  1970 


25S 


